2020 DBMS
2020 DBMS
2020 DBMS
Usually, the terms “data” and “information” are used interchangeably. However, there is a subtle difference
between the two .In a nutshell, data can be a number, symbol, character, word, codes, graphs, etc. On the
other hand, information is data put into context. Information is utilised by humans in some significant way
(such as to make decisions, forecasts etc).
A basic example of information would be a computer. A computer uses programming scripts, formulas, or
software applications to turn data into information.
Data Information
Data is an individual unit that contains raw materials Information is a group of data that
which do not carry any specific meaning. collectively carries a logical meaning.
Raw data alone is insufficient for decision making Information is sufficient for decision making
An example of data is a student’s test score The average score of a class is the
information derived from the given data.
The conceptual data model describes the database at a very high level and is useful to understand the needs
or requirements of the database. It is this model, that is used in the requirement-gathering process i.e. before
the Database Designers start making a particular database. One such popular model is the entity/relationship
model (ER model). The E/R model specializes in entities, relationships, and even attributes that are used by
database designers. In terms of this concept, a discussion can be made even with non-computer science(non-
technical) users and stakeholders, and their requirements can be understood.
Entity-Relationship Model( ER Model): It is a high-level data model which is used to define the data and the
relationships between them. It is basically a conceptual design of any database which is easy to design the
view of data.
Components of ER Model:
1. Entity: An entity is referred to as a real-world object. It can be a name, place, object, class, etc. These are
represented by a rectangle in an ER Diagram.
2. Attributes: An attribute can be defined as the description of the entity. These are represented by Eclipse
in an ER Diagram. It can be Age, Roll Number, or Marks for a Student.
3. Relationship: Relationships are used to define relations among different entities. Diamonds and
Rhombus are used to show Relationships.
This type of data model is used to represent only the logical part of the database and does not represent the
physical structure of the database. The representational data model allows us to focus primarily, on the design
part of the database. A popular representational model is a Relational model. The relational Model consists
of Relational Algebra and Relational Calculus. In the Relational Model, we basically use tables to represent
our data and the relationships between them. It is a theoretical concept whose practical implementation is
done in Physical Data Model.
The advantage of using a Representational data model is to provide a foundation to form the base for the
Physical model
The physical Data Model is used to practically implement Relational Data Model. Ultimately, all data in a
database is stored physically on a secondary storage device such as discs and tapes. This is stored in the
form of files, records, and certain other data structures. It has all the information on the format in which the
files are present and the structure of the databases, the presence of external data structures, and their relation
to each other. Here, we basically save tables in memory so they can be accessed efficiently. In order to come
up with a good physical model, we have to work on the relational model in a better way. Structured Query
Language (SQL) is used to practically implement Relational Algebra.
This Data Model describes HOW the system will be implemented using a specific DBMS system. This model
is typically created by DBA and developers. The purpose is actual implementation of the database.
The physical data model describes data need for a single project or application though it maybe
integrated with other physical data models based on project scope.
Data Model contains relationships between tables that which addresses cardinality and nullability of the
relationships.
Developed for a specific version of a DBMS, location, data storage or technology to be used in the
project.
Columns should have exact datatypes, lengths assigned and default values.
Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined
Some Other Data Models
1. Hierarchical Model
The hierarchical Model is one of the oldest models in the data model which was developed by IBM, in the
1950s. In a hierarchical model, data are viewed as a collection of tables, or we can say segments that form a
hierarchical relation. In this, the data is organized into a tree-like structure where each record consists of one
parent record and many children. Even if the segments are connected as a chain-like structure by logical
associations, then the instant structure can be a fan structure with multiple branches. We call the illogical
associations as directional associations.
2. Network Model
The Network Model was formalized by the Database Task group in the 1960s. This model is the generalization
of the hierarchical model. This model can consist of multiple parent segments and these segments are
grouped as levels but there exists a logical association between the segments belonging to any level. Mostly,
there exists a many-to-many logical association between any of the two segments.
In the Object-Oriented Data Model, data and their relationships are contained in a single structure which is
referred to as an object in this data model. In this, real-world problems are represented as objects with different
attributes. All objects have multiple relationships between them. Basically, it is a combination of Object
Oriented programming and a Relational Database Model.
The float data model basically consists of a two-dimensional array of data models that do not contain any
duplicate elements in the array. This data model has one drawback it cannot store a large amount of data that
is the tables can not be of large size.
The Context data model is simply a data model which consists of more than one data model. For example,
the Context data model consists of ER Model, Object-Oriented Data Model, etc. This model allows users to
do more than one thing which each individual data model can do.
Semi-Structured data models deal with the data in a flexible way. Some entities may have extra attributes and
some entities may have some missing attributes. Basically, you can represent data here in a flexible way.
Advantages of Data Models
1. Data Models help us in representing data accurately.
2. It helps us in finding the missing data and also in minimizing Data Redundancy.
3. Data Model provides data security in a better way.
4. The data model should be detailed enough to be used for building the physical database.
5. The information in the data model can be used for defining the relationship between tables, primary and
foreign keys, and stored procedures.
Disadvantages of Data Models
1. In the case of a vast database, sometimes it becomes difficult to understand the data model.
2. You must have the proper knowledge of SQL to use physical models.
3. Even smaller change made in structure require modification in the entire application.
4. There is no set data manipulation language in DBMS.
5. To develop Data model one should know physical data stored characteristics.
Conclusion
Data modeling is the process of developing data model for the data to be stored in a Database.
Data Models ensure consistency in naming conventions, default values, semantics, security while
ensuring quality of the data.
Data Model structure helps to define the relational tables, primary and foreign keys and stored
procedures.
There are three types of conceptual, logical, and physical.
The main aim of conceptual model is to establish the entities, their attributes, and their relationships.
Logical data model defines the structure of the data elements and set the relationships between them.
A Physical Data Model describes the database specific implementation of the data model.
The main goal of a designing data model is to make certain that data objects offered by the functional
team are represented accurately.
The biggest drawback is that even smaller change made in structure require modification in the entire
application.
Reading this Data Modeling tutorial, you will learn from the basic concepts such as What is Data Model?
Introduction to different types of Data Model, advantages, disadvantages, and data model example.
DML (Data manipulation language) – Data manipulation language provides commands for manipulating data
in databases.
Common DML commands are
a. Select – This command is used to display information from the database.
b. Insert – This command is used to insert new records in the database.
c. Delete – This command is used to delete records from the database.
d. Update – This command is used to modify records in the database.
The ER model is a very important concept in DBMS, and it is used for the modeling of the logical view of the
system from a data perspective. The entity, Entity Set, and Entity Type all these terms are very important
concepts of ER Model. In this article, we will understand the difference between them.
1. Entity : An entity is a thing in a real-world with independent existence. An entity can exist independently
and is distinguishable from other objects. It can be identified uniquely.
An entity can be of two types :
Tangible Entity : Entities that exist in the real world physically. Example: Person, car, etc.
Intangible Entity : Entities that exist only logically and have no physical existence. Example: Bank
Account, etc.
Example :
A student with a particular roll number is an entity.
A company with a particular registration number is an entity.
Note :
An entity may be concrete like a student, a book, or abstract like a holiday or a particular concept.
An entity is represented by a set of attributes.
In a particular relation in RDBMS, a particular record is called an entity.
2. Entity Type : It refers to the category that a particular entity belongs to.
Example :
A table named student in a university database.
A table named employee in a company database.
Note :
The category of a particular entity in the relation in RDBMS is called the entity type.
It is represented by the name of the table and its schema.
3. Entity Set : An entity set is a collection or set of all entities of a particular entity type at any point in time.
The type of all the entities should be the same.
Example :
The collection of all the students from the student table at a particular instant of time is an example of an
entity set.
The collection of all the employees from the employee table at a particular instant of time is an example
of an entity set.
Note :
Entity sets need not be disjoint. For example, the entity set of Article Writer (all content creators for
Website A) and the entity set of Article Reader (all people who read the article of Website A) may have
members in common.
The collection of all the entities in the relation of RDBMS is called an entity set.
1 Avi 19 M
2 Ayush 23 M
3 Nikhil 21 M
4 Riya 16 F
1 Avi 19 M
Entity Type : Each entity belongs to the student type. Hence, the type of entity here is a student.
Entity Set : The complete data set of all entities is called entity set. For the above table, the records with
student id 1, 2, 3, 4 are the entity set.
Difference Table :
Entity Entity Type Entity Set
Any particular row (a record) in a The name of a relation (table) in All rows of a relation (table)
relation(table) is known as an entity. RDBMS is an entity type in RDBMS is entity set
This operator is also called minus operator. The EXCEPT operation finds the difference between the two
queries and the result comprises the rows that belong only to the first query.
In SQL, EXCEPT returns those tuples that are returned by the first SELECT operation, and not returned by
the second SELECT operation.
Example:
Select all employee names in ‘sales’ department who are not involving in a project called ‘product X
AVG
The AVG() function returns the average values in a set. The following illustrates the syntax of
the AVG() function:
The ALL keyword instructs the AVG() function to calculate the average of all values while
the DISTINCT keyword forces the function to operate on distinct values only. By default, the ALL option is
used.
The following example shows how to use the AVG() function to calculate the average salary of each
department:
MIN
The MIN() function returns the minimum value of a set. The following illustrates the syntax of
the MIN() function:
For example, the following statement returns the minimum salary of the employees in each department:
MAX
The MAX() function returns the maximum value of a set. The MAX() function has the following syntax:
COUNT
The COUNT() function returns the number of items in a set. The following shows the syntax of
the COUNT() function:
SUM
The SUM() function returns the sum of all values. The following illustrates the syntax of the SUM() function:
What is rollback?
A rollback is the operation of restoring a database to a previous state by canceling a specific transaction or
transaction set. Rollbacks are either performed automatically by database systems or manually by users.
When a database user changes a data field but has not yet saved the change, the data is stored in a
temporary state or transaction log. Users querying the unsaved data see the unchanged values. The action of
saving the data is a commit; this allows subsequent queries for this data to show the new values.
However, a user may decide not to save the data. Under this condition, a rollback command manipulates the
data to discard any changes made by the user, and does so without communicating this to the user. Thus, a
rollback occurs when a user begins changing data, realizes the wrong record is being updated and then
cancels the operation to undo any pending changes.
Rollbacks also may be issued automatically after a server or database crash, e.g. after a sudden power loss.
When the database restarts, all logged transactions are reviewed; then all pending transactions are rolled
back, allowing users to reenter and save appropriate changes.
privilege_name is the access right or privilege granted to the user. Some of the access rights are ALL,
EXECUTE, and SELECT.
object_name is the name of an database object like TABLE, VIEW, STORED PROC and SEQUENCE.
user_name is the name of the user to whom an access right is being granted.
user_name is the name of the user to whom an access right is being granted.
PUBLIC is used to grant access rights to all users.
ROLES are a set of privileges grouped together.
WITH GRANT OPTION - allows a user to grant access rights to other users.
For Example: GRANT SELECT ON employee TO user1; This command grants a SELECT permission on
employee table to user1.You should use the WITH GRANT option carefully because for example if you
GRANT SELECT privilege on employee table to user1 using the WITH GRANT option, then user1 can
GRANT SELECT privilege on employee table to another user, such as user2 etc. Later, if you REVOKE the
SELECT privilege on employee from user1, still user2 will have SELECT privilege on employee table.
SQL REVOKE Command:
The REVOKE command removes user access rights or privileges to the database objects.
The Syntax for the REVOKE command is:
For Example: REVOKE SELECT ON employee FROM user1;This command will REVOKE a SELECT
privilege on employee table from user1.When you REVOKE SELECT privilege on a table from a user, the
user will not be able to SELECT data from that table anymore. However, if the user has received SELECT
privileges on that table from more than one users, he/she can SELECT from that table until everyone who
granted the permission revokes it. You cannot REVOKE privileges if they were not initially granted by you.
You can use this stored data for computing and presentation. In many systems, data independence is an
essential function for components of the system.
1. Physical/Internal
2. Conceptual
3. External
With Physical independence, you can easily change the physical storage structures or devices with an effect
on the conceptual schema. Any change done would be absorbed by the mapping between the conceptual and
internal levels. Physical data independence is achieved by the presence of the internal level of the database
and then the transformation from the conceptual level of the database to the internal level.
1. External views
2. External API or programs
Any change made will be absorbed by the mapping between external and conceptual levels.
When compared to Physical Data independence, it is challenging to achieve logical data independence.
Modification at the logical levels is significant Modifications made at the internal levels may
whenever the logical structures of the database or may not be needed to improve the
are changed. performance of the structure.
Summary
Data Independence is the property of DBMS that helps you to change the Database schema at one
level of a database system without requiring to change the schema at the next higher level.
Two levels of data independence are 1) Physical and 2) Logical
Physical data independence helps you to separate conceptual levels from the internal/physical levels
Logical Data Independence is the ability to change the conceptual scheme without changing
When compared to Physical Data independence, it is challenging to achieve logical data independence
Data Independence Helps you to improve the quality of the data
Parser/Translator
Optimizer
Execution Plan
Query Execution Engine
1. Parser/Translator: When we submit any query to run the task of the parser is to checks the syntax and
verify the query. It checks whether query syntax is correct or not. It also checks for query semantics. Our SQL
command is a high-level language so we need to convert it to low-level language. The translator converts SQL
query to relational algebra expression i.e from high-level language to low-level language.
2. Optimizer: As we know when we submit any query then it is converted into various relational algebra
expressions. We have many relational algebra expressions for the given query. Optimizer selects the query
which is having a low cost. For this optimizer uses the statistics about the data.
3. Execution Plan: In this phase, the database decides the order of execution of given query instructions. For
example, consider the following query:
Here database will first execute From statement then Where and finally Select statement. So the order of
execution for this query is as follows:
4. Query Execution Engine: The query execution engine takes a physical query plan, that is execution plan,
executes the plan, and returns the result. Examples of query engines are Apache Spark, Apache Drill, Presto,
etc. So basically query execution engine interprets SQL command and access data from the database and
then finally returns the answer of the given SQL query. There are many functions performed by the query
execution, some of them being:
It acts as a dispatcher for all commands in the execution plan. It iterates through all the commands in the
plan until the batch is complete.
And it interacts with the storage engine to retrieve and update data from tables and indexes.
Structural Constraints are also called Structural properties of a database management system (DBMS).
Cardinality Ratios and Participation Constraints taken together are called Structural Constraints. The name
constraints refer to the fact that such limitations must be imposed on the data, for the DBMS system to be
consistent with the requirements.
The Structural constraints are represented by Min-Max notation. This is a pair of numbers(m, n) that
appear on the connecting line between the entities and their relationships. The minimum number of times an
entity can appear in a relation is represented by m whereas, the maximum time it is available is denoted by
n. If m is 0 it signifies that the entity is participating in the relation partially, whereas, if m is either greater
than or equal to 1, it denotes total participation of the entity. Note – Number of times an entity participates in
a relationship is same as the number appearance of the entity in the tuples.
SELECT COMMAND It is used for viewing or retrieving data from the table. We can use logic operators as
well as relational operators with in the select command.
The SELECT clause is mandatory and carries out the relational project operation.
The FROM clause is also mandatory. It identifies one or more tables and/or views from which to retrieve the
column data displayed in a result table.
The WHERE clause is optional and carries out the relational select operation. It specifies which rows are to be
selected.
The GROUP BY clause is optional. It organizes data into groups by one or more column names listed in the
SELECT clause.
The optional HAVING clause sets conditions regarding which groups to include in a result table. The groups are
specified by the GROUP BY clause.
The ORDER BY clause is optional. It sorts query results by one or more columns in ascending or descending
order.
Here, the SQL command selects all rows from the Customers table and then sorts them in ascending order
by country.
Here,
column1, column2, ... are the columns to be included in the result set
table is the name of the table from where the rows are selected
columnA, columnB, ... are the column(s) based on which the rows will be ordered
For example,
ORDER BY in SQL
We can use the ASC keyword to explicitly sort selected records in ascending order. For example,
Here, the SQL command selects all the rows from Customers table and then sorts them in ascending order
by age.
Note: The ORDER BY clause sorts result set in ascending by default, even without the ASC clause.
We use the DESC keyword to sort the selected records in descending order. For example,
Here, the SQL command selects all the customers and then sorts them in descending order by age.
ORDER BY With Multiple Columns
Here, the SQL command selects all the records and then sorts them by first_name. If the first_name repeats
more than once, it sorts those records by age.
ORDER BY With WHERE
We can also use ORDER BY with the SELECT WHERE clause. For example,
Here,
The SQL command first selects the last_name and age fields from the Customers table if their country is
not UK.
Then, the selected records are sorted in descending order by their last_name.
19. General anomalies and functional dependency in a relation
A functional dependency (FD) is a relationship between two attributes, typically between the PK and other
non-key attributes within a table. For any relation R, attribute Y is functionally dependent on attribute X (usually
the PK), if for every valid instance of X, that value of X uniquely determines the value of Y. This relationship is
indicated by the representation below :
X ———–> Y
The left side of the above FD diagram is called the determinant, and the right side is the dependent. Here are
a few examples.
In the first example, below, SIN determines Name, Address and Birthdate. Given SIN, we can determine any
of the other attributes within the table.
For the second example, SIN and Course determine the date completed (DateCompleted). This must also
work for a composite PK.
SIN, Course ———> DateCompleted
STUDENT Table
STUD_NO STUD_NAME STUD_PHONE STUD_STATE STUD-COUNTRY STUD_AGE
Table 1
STUDENT_COURSE
STUD_NO COURSE_NO COURSE_NAME
1 C1 DBMS
2 C2 Computer Networks
1 C2 Computer Networks
Table 2
Insertion anomaly: If a tuple is inserted in referencing relation and referencing attribute value is not present
in referenced attribute, it will not allow insertion in referencing relation.
Example: If we try to insert a record in STUDENT_COURSE with STUD_NO =7, it will not allow it.
Deletion and Updation anomaly: If a tuple is deleted or updated from referenced relation and the referenced
attribute value is used by referencing attribute in referencing relation, it will not allow deleting the tuple from
referenced relation.
Example: If we want to update a record from STUDENT_COURSE with STUD_NO =1, We have to update it
in both rows of the table. If we try to delete a record from STUDENT with STUD_NO =1, it will not allow it.
To avoid this, the following can be used in query:
ON DELETE/UPDATE SET NULL: If a tuple is deleted or updated from referenced relation and the
referenced attribute value is used by referencing attribute in referencing relation, it will delete/update the
tuple from referenced relation and set the value of referencing attribute to NULL.
ON DELETE/UPDATE CASCADE: If a tuple is deleted or updated from referenced relation and the
referenced attribute value is used by referencing attribute in referencing relation, it will delete/update the
tuple from referenced relation and referencing relation as well.
Insertion Anomalies: These anomalies occur when it is not possible to insert data into a database
because the required fields are missing or because the data is incomplete. For example, if a database
requires that every record has a primary key, but no value is provided for a particular record, it cannot be
inserted into the database.
Deletion anomalies: These anomalies occur when deleting a record from a database and can result in
the unintentional loss of data. For example, if a database contains information about customers and orders,
deleting a customer record may also delete all the orders associated with that customer.
Update anomalies: These anomalies occur when modifying data in a database and can result in
inconsistencies or errors. For example, if a database contains information about employees and their
salaries, updating an employee’s salary in one record but not in all related records could lead to incorrect
calculations and reporting.
Removal of Anomalies
These anomalies can be avoided or minimized by designing databases that adhere to the principles of
normalization. Normalization involves organizing data into tables and applying rules to ensure data is stored
in a consistent and efficient manner. By reducing data redundancy and ensuring data integrity, normalization
helps to eliminate anomalies and improve the overall quality of the database
According to E.F.Codd, who is the inventor of the Relational Database, the goals of Normalization include:
it helps in removing all the repeated data from the database.
it helps in removing undesirable deletion, insertion, and update anomalies.
it helps in making a proper and useful relationship between tables.
FAQs
1. What is Normalization?
Data Integrity: Relational databases enforce data integrity through various constraints such as primary
keys, foreign keys, and referential integrity rules, ensuring that the data is accurate and consistent.
Scalability: Relational databases are highly scalable and can handle large amounts of data without
sacrificing performance.
Flexibility: The relational model allows for flexible querying of data, making it easier to retrieve specific
information and generate reports.
Security: Relational databases provide robust security features to protect data from unauthorized access.
Disadvantages:
Redundancy: The relational model can result in data redundancy, where the same data is stored in multiple
places, leading to inefficiency and potential data inconsistencies.
Complexity: Creating and maintaining a relational database can be complex and time-consuming, requiring
specialized knowledge and skills.
Performance: As the size of the database grows, performance can suffer due to the need to join multiple
tables to retrieve information.
Inability to handle unstructured data: The relational model is not well-suited for handling unstructured or
semi-structured data, such as images, videos, and text documents.
ting the tables into smaller ones so as to remove anomalies in the database. It helps in
reducing redundancy in the database.
2. What are Anomalies in the Relational Model?
An anomaly is a fault that is present in the database which occurs because of the poor maintenance and poor
storing of the data in the flat database. Normalization is the process of removing anomalies from the database.
3. How Anomalies can be removed?
Anomalies can be removed with the process of Normalization. Normalization involves organizing data into
tables and applying rules to ensure data is stored in a consistent and efficient manner.
Here are a few additional ideas for database security control measures.
1. Data quality
The database community needs techniques and organizational solutions to assess and attest to data quality.
These techniques include simple mechanisms such as quality stamps posted on websites. It also requires
techniques to provide more effective integrity semantics verification and tools for assessing data quality based
on techniques such as record linkage. Application-level recovery techniques are also needed for automatically
repairing incorrect data. The Extract Transform Load tools widely used to load data in data warehouses are
grappling with these issues.
3. Database Survivability
Database systems must operate and continue their functions despite disruptive events such as information
warfare attacks, even with reduced capabilities. A DBMS makes every effort to prevent an attack and detecting
one in the event of occurrence should be able to do the following:
a. Confinement – Take immediate action to eliminate the attackers’ access to the system and to isolate or
contain the problem to prevent further spread.
b. Damage assessment – Determine the extent of the problem, including failed functions and corrupted data.
c. Reconfiguration – Reconfigure to allow the operation to continue in a degraded mode while recovery
proceeds.
d. Repair – Repair corrupted or lost data and repair or reinstall failed system functions to reestablish an
average level of operation.
e. Fault treatment – To the extent possible, identify the weakness exploited in the attack and take steps to
prevent a recurrence.
The goal of the information warfare attacker is to damage the organization’s operation and fulfil its mission
through the disruption of its information systems. The target of an attack may be the system itself or its data.
While the attacks that bring the system down outright are severe and dramatic, they must also be well timed to
achieve the attackers’ goal since attacks will receive immediate and concentrated attention to get the system
back to operational condition, diagnose how the attack took place
and install preventive measures.
Conclusion
Factors such as security concerns evolution, the disintermediation of data access, and new computing
paradigms and applications, such as grid-based computing and on-demand business, introduced new security
requirements and new contexts to apply and possibly extend current approaches to achieve data security.
These information security measures include access control, auditing, authentication, encryption, integrity
controls, backups and application security. The security designs for specific database systems specify security
administration and management functions such as administration and reporting of user access rights, log
management and analysis, database replication or synchronization and backups, along with various business-
driven information security controls within the database programs and functions, for example, data entry
validation and audit trails. Also, various security-related activities, such as manual controls, are generally
incorporated into the procedures and guidelines relating to database design, development, configuration, use,
management and maintenance.
22. Database users and how they interact with the database:
Database users are categorized based up on their interaction with the database. These are seven types of
database users in DBMS.
1. Database Administrator (DBA) : Database Administrator (DBA) is a person/team who defines the
schema and also controls the 3 levels of database. The DBA will then create a new account id and
password for the user if he/she need to access the database. DBA is also responsible for providing
security to the database and he allows only the authorized users to access/modify the data base. DBA is
responsible for the problems such as security breaches and poor system response time.
DBA also monitors the recovery and backup and provide technical support.
The DBA has a DBA account in the DBMS which called a system or superuser account.
DBA repairs damage caused due to hardware and/or software failures.
DBA is the one having privileges to perform DCL (Data Control Language) operations such as
GRANT and REVOKE, to allow/restrict a particular user from accessing the database.
2. Naive / Parametric End Users : Parametric End Users are the unsophisticated who don’t have any
DBMS knowledge but they frequently use the database applications in their daily life to get the desired
results. For examples, Railway’s ticket booking users are naive users. Clerks in any bank is a naive user
because they don’t have any DBMS knowledge but they still use the database and perform their given
task.
3. System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They check whether
all the requirements of end users are satisfied.
4. Sophisticated Users : Sophisticated users can be engineers, scientists, business analyst, who are
familiar with the database. They can develop their own database applications according to their
requirement. They don’t write the program code but they interact the database by writing SQL queries
directly through the query processor.
5. Database Designers : Data Base Designers are the users who design the structure of database which
includes tables, indexes, views, triggers, stored procedures and constraints which are usually enforced
before the database is created or populated with data. He/she controls what data must be stored and
how the data items to be related. It is responsibility of Database Designers to understand the
requirements of different user groups and then create a design which satisfies the need of all the user
groups.
6. Application Programmers : Application Programmers also referred as System Analysts or simply
Software Engineers, are the back-end programmers who writes the code for the application programs.
They are the computer professionals. These programs could be written in Programming languages such
as Visual Basic, Developer, C, FORTRAN, COBOL etc. Application programmers design, debug, test,
and maintain set of programs called “canned transactions” for the Naive (parametric) users in order to
interact with database.
7. Casual Users / Temporary Users : Casual Users are the users who occasionally use/access the
database but each time when they access the database they require the new information, for example,
Middle or higher level manager.
8. Specialized users : Specialized users are sophisticated users who write
specialized database application that does not fit into the traditional data-
processing framework. Among these applications are computer aided-design
systems, knowledge-base and expert systems etc.
The first best practice for naming conventions in data modeling is to write down all the criteria defining the
adopted naming convention. So that it is always visible and at hand, it should be included as a text annotation
together with the entity-relationship diagram (ERD). If you use a database design tool like VERTABELO,
your NAMING CONVENTIONS IN DATABASE MODELING can be documented in virtual sticky notes
that will always be attached to your ERDs. This and other DATABASE MODELING TIPS can save you time
and effort when doing data modeling work.
In software systems that are used and developed in a single country, there’s probably no need to specify the
language. But if a system involves designers or developers of different nationalities (which is becoming
increasingly common), language choice is not trivial. In such situations, the language used to name schema
objects must be clearly specified in the naming convention and must be respected. This way, there will be no
chances of finding names in different languages.
Database tables represent real-world entities, so it is appropriate to use nouns when choosing their names.
When considering database table naming conventions, you must make a decision that seems trivial but is
actually crucial: use plural or singular nouns for the names. (In the case of database column naming
conventions, this problem does not arise, as column names are always singular.)
Some say that the singular should be used because tables represent a single entity, not a collection of things.
For example: Client , Item , Order , Invoice , etc. Others prefer to use the plural, considering that the table is
a container for a collection of things. So, with the same criteria as they would label a box for storing toys, they
give the tables plural names: Customers, Items, Orders, Invoices.
While it is possible to use spaces and any printable character to name tables and other database objects, this
practice is strongly discouraged. The use of spaces and special characters requires that object names be
enclosed in delimiters so that they do not invalidate SQL statements.
For example, if you have a table in your schema called Used Cars (with a space between Used and Cars ),
you’d have to write the table name between delimiters in any query that uses it. For example:
Case Sensitivity
SQL is case-insensitive, so the case-sensitivity of object names is irrelevant when writing queries or issuing
SQL commands to your database. However, a good practice for database schema naming is to clearly define
a case-sensitive criteria for object names. The criteria adopted will affect the readability of the schema, its
neatness, and the interpretation of its elements.
For example, you can use names with all capital letters for tables and upper/lower case for columns. This will
make it easier to identify objects in database diagrams as well as in SQL statements and other database tools.
Compound Names
The ideal name for any database object should strike the optimal balance between synthesis and self-
explanation. Ideally, each name should contain an explanation of what it represents in the real world and also
be able to be synthesized in one word. This is easy to achieve in some cases, particularly when you create a
conceptual schema with tables that contain information on tangible
elements: Users , Employees , Roles , Payrolls , etc.
But as you get a bit more detailed in your diagrams, you will come across elements that cannot be self-
explanatory with a single word. You will then have to define names for objects that represent, for example,
roles per user, payroll items, ID numbers, joining dates, and many others.
Another good schema naming practice is to adopt clear criteria for the use of compound names. Otherwise,
each designer or programmer will use their own criteria – and your schema will quickly become full of random
names.
There are two popular naming options for using compound names. You can either use camel case
(e.g. PayrollItems or DateOfBirth ) or you can separate the words with an underscore,
(e.g. PAYROLL_ITEMS or DATE_OF_BIRTH ).
Using underscore as a word separator in compound names is the way to go for capitalized names; it ensures
the names can be easily read at a glance.
Abbreviations
Using abbreviations for object names is inadvisable, but so is using names that are too long. Ideally, a middle
ground should be found. For example, I recommend only abbreviating object names if the name exceeds 20
characters.
If you make heavy use of abbreviations because many objects in your schema have long names, the list of
abbreviations to be used should be explicit for all users of your schema. In addition, this list should be part of
the naming convention of your schema. You can add a sticky note to your diagrams where you explicitly detail
all abbreviations used along with their extended meaning.
Abbreviations and prefixes should be avoided; they only cause confusion to everyone that needs to work with
the database.
Some people use prefixes or suffixes to denote an element’s type so that it can be easily identified without
referencing the schema description or the database metadata. For example, they add the prefix T_ to all
tables and V_ to all views. Or they add a suffix that denotes the data type of each column.
Using suffixes or prefixes may result in two objects of different types with similar names. For example, you
could easily have a table named T_CUSTOMERS and a view named V_CUSTOMERS . Whoever has to
write a query may not know which of the two should be used and what the difference is between them.
Remember that a view name should indicate its purpose. It would be more helpful if the view name were, for
example, NEW_CUSTOMERS , indicating that it is a subset of the CUSTOMERS table.
Using a suffix indicating the data type of each column does not add useful information. The exception is when
you need to use a counter-intuitive data type for a column. For example, if you need a column to store a date
in integer format, then you could use the int suffix and name the column something like Date_int .
Another common (but discouraged!) practice is to prefix each column name with an abbreviation of the table
name. This adds unnecessary redundancy and makes queries difficult to read and write.
Prefixes for Naming Dependent Objects
A prefix denoting the type of object is considered good practice when naming table- or column-dependent
objects (e.g. indexes, triggers, or constraints). Such objects are not usually represented in database diagrams,
instead being mentioned in schema metadata queries, in logs or execution plans, or in error messages thrown
by the database engine. Some commonly used prefixes are:
The suggested way to use these prefixes is to concatenate them with the table name and an additional
element denoting the function the constraint performs. In a foreign key constraint, we might indicate the table
at the other end of the constraint; in an index, we might indicate the column names that compose this index. A
foreign key constraint between the Customers table and the Orders table could be
named FK_Customers_Orders .
This way of naming dependent objects makes it easier to relate the object to the table(s) on which it depends.
This is important when it is mentioned in an execution log or error message.
Since you don’t usually have to write the names of dependent objects (like foreign keys or indexes) in SQL
statements, it’s not really important if they are long or do not meet the same naming criteria as objects like
tables, fields, or views.
Another commonly accepted use for prefixes is to quickly distinguish sets of objects that belong to a functional
or logical area of the schema. For example, in a data warehouse schema prefixes let us distinguish dimension
tables from fact tables. They can also distinguish tables with “cold data” from tables with “hot data”, if this kind
of distinction is a top priority in your schema.
In a schema that is used by different applications, prefixes can help to easily identify the tables that are used
by each application. For example, you can establish that tables starting with INV belong to an invoicing app
and those starting with PAY belong to a payroll app.
As I recommended above for abbreviations, these prefixes need to be made explicit in the database diagram,
either through sticky notes or some other form of documentation. This form of grouping by prefix will make it
easier to manage object permissions according to the application that uses them.
It is quite common to create views in a database schema to facilitate the writing of queries by solving common
filtering criteria. In a schema that stores membership information, for example, you could create a view of
approved memberships. This would save database developers the task of finding out what conditions a
membership must meet to be approved.
Naming Views
For the above reason, it is common for view names to consist of the table name plus a qualifier designating
the purpose of that view. Since views named in this way often have compound names, you should use the
criteria you’ve adopted for compound names. In the example above, the view might be
called ApprovedMemberships or APPROVED_MEMBERSHIPS , depending on the criteria chosen for
compound names. In turn, you could create a view of memberships pending approval
called PendingMemberships or PENDING_MEMBERSHIPS .
Since views are used as if they were tables, it is good practice that their names follow the same naming
convention as table names – e.g. if you use all uppercase for table names, you should also use all uppercase
for view names.
It is a good practice to name views after their “mother” table (when there is one), adding a qualifier that
designates the purpose of that view.
It’s important to make views known. Anyone who uses the database for querying or design work should know
that there are views that can simplify their work.
One way to force users to use views is to restrict access to tables. This ensures that users use the views and
not the tables and that there is no confusion about how to filter the data to get subsets of the tables.
In the case of the membership schema mentioned above, you can restrict access to the Memberships table
and only provide access to the ApprovedMemberships and PendingMemberships views. This ensures that
no one has to define what criteria to use to determine whether a membership is approved or pending.
It is also good practice to include the views in the database diagram and explain their usefulness with sticky
notes. Any user looking at the diagram will also be aware of the existence of the views.
Compliance and Practicality for Database Naming Conventions
Naming convention criteria cannot be enforced by the database engine. This means that compliance must be
overseen by a designer who controls the work of anyone who has permissions to create or modify the
structure of a database. If no one is charged with overseeing naming convention adherence, it is of no use.
While intelligent database design tools such as Vertabelo help ensure that certain naming criteria are met, full
monitoring of the criteria requires a trained human eye.
On the other hand, the best way to enforce the criteria of a naming convention is for those criteria to be useful
and practical. If they are not, users will comply with them reluctantly and drop them as soon as they can. If you
have been given the task of defining a database schema naming convention, it is important that you create it
with the purpose of benefiting the users. And make sure all users are clear about those benefits so they’ll
comply with the convention without protest.
The DDL Commands in Structured Query Language are used to create and modify the schema of the database
and its objects. The syntax of DDL commands is predefined for describing the data. The commands of Data
Definition Language deal with how the data should exist in the database.
1. CREATE Command
2. DROP Command
3. ALTER Command
4. TRUNCATE Command
5. RENAME Command
CREATE Command
CREATE is a DDL command used to create databases, tables, triggers and other database objects.
Example 1: This example describes how to create a new database using the CREATE DDL command.
Suppose, you want to create a Books database in the SQL database. To do this, you have to write the following
DDL Command:
Example 2: This example describes how to create a new table using the CREATE DDL command.
Syntax to create a new table:
Suppose, you want to create a Student table with five columns in the SQL database. To do this, you have to
write the following DDL command:
Example 3: This example describes how to create a new index using the CREATE DDL command.
DROP is a DDL command used to delete/remove the database objects from the SQL database. We can easily
remove the entire table, view, or index from the database using this DDL command.
Example 1: This example describes how to remove a database from the SQL database.
Suppose, you want to delete the Books database from the SQL database. To do this, you have to write the
following DDL command:
ALTER Command
ALTER is a DDL command which changes or modifies the existing structure of the database, and it also changes
the schema of database objects.
We can also add and drop constraints of the table using the ALTER command.
Example 1: This example shows how to add a new field to the existing table.
RENAME Command
RENAME is a DDL command which is used to change the name of the database table.
25. Summarize normal forms based on primary keys and corresponding normalization process:
In database management systems (DBMS), normal forms are a series of guidelines that help to ensure that
the design of a database is efficient, organized, and free from data anomalies. There are several levels of
normalization, each with its own set of guidelines, known as normal forms.
1. First Normal Form (1NF): This is the most basic level of normalization. In 1NF, each table cell should
contain only a single value, and each column should have a unique name. The first normal form helps to
eliminate duplicate data and simplify queries.
2. Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each non-key attribute be
dependent on the primary key. This means that each column should be directly related to the primary
key, and not to other columns.
3. Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key attributes are independent of
each other. This means that each column should be directly related to the primary key, and not to any
other columns in the same table.
4. Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures that each determinant in
a table is a candidate key. In other words, BCNF ensures that each non-key attribute is dependent only
on the candidate key.
5. Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures that a table does not
contain any multi-valued dependencies.
6. Fifth Normal Form (5NF): 5NF is the highest level of normalization and involves decomposing a table
into smaller tables to remove data redundancy and improve data integrity.
Normal forms help to reduce data redundancy, increase data consistency, and improve database
performance. However, higher levels of normalization can lead to more complex database designs and
queries. It is important to strike a balance between normalization and practicality when designing a database
Reduced data redundancy: Normalization helps to eliminate duplicate data in tables, reducing the
amount of storage space needed and improving database efficiency.
Improved data consistency: Normalization ensures that data is stored in a consistent and organized
manner, reducing the risk of data inconsistencies and errors.
Simplified database design: Normalization provides guidelines for organizing tables and data
relationships, making it easier to design and maintain a database.
Improved query performance: Normalized tables are typically easier to search and retrieve data from,
resulting in faster query performance.
Easier database maintenance: Normalization reduces the complexity of a database by breaking it down
into smaller, more manageable tables, making it easier to add, modify, and delete data.
Overall, using normal forms in DBMS helps to improve data quality, increase database efficiency, and
simplify database design and maintenance.
If a relation contain composite or multi-valued attribute, it violates first normal form or a relation is in first
normal form if it does not contain any composite or multi-valued attribute. A relation is in first normal form if
every attribute in that relation is singled valued attribute.
Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute
STUD_PHONE. Its decomposition into 1NF has been shown in table
2.
Example 2 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
In the above table Course is a multi-valued attribute so it is not in 1NF. Below Table is in 1NF as there is
no multi-valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
2. Second Normal Form –
To be in second normal form, a relation must be in first normal form and relation must not contain any partial
dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes
which are not part of any candidate key) is dependent on any proper subset of any candidate key of the
table. Partial Dependency – If the proper subset of candidate key determines non-prime attribute, it is
called partial dependency.
Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
{Note that, there are many courses having the same course fee. } Here, COURSE_FEE cannot alone
decide the value of COURSE_NO or STUD_NO; COURSE_FEE together with STUD_NO cannot decide
the value of COURSE_NO; COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO; Hence, COURSE_FEE would be a non-prime attribute, as it does not belong to the one only
candidate key {STUD_NO, COURSE_NO} ; But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is
dependent on COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute
COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial dependency and
so this relation is not in 2NF. To convert the above relation to 2NF, we need to split the table into two
tables such as : Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5 NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are
100 students taking C1 course, we don’t need to store its Fee as 1000 for all the 100 records, instead,
once we can store it in the second table as the course fee for C1 is 1000.
Example 2 – Consider following functional dependencies in relation R (A, B , C, D )
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper
subset of AB doesn’t determine any non-prime attribute.
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).
Example 1 – In relation STUDENT given in Table 4, FD set: {STUD_NO -> STUD_NAME, STUD_NO ->
STUD_STATE, STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE} Candidate Key:
{STUD_NO} For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the
third normal form. To convert it in third normal form, we will decompose the relation STUDENT
(STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE) STATE_COUNTRY
(STATE, COUNTRY)
Example 2 – Consider relation R(A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All possible candidate
keys in above relation are {A, E, CD, BC} All attributes are on right sides of all functional dependencies
are prime.
Example 1 – Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC->D,
AC->BE, B->E} Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can
determine all attribute of relation, So AC will be candidate key. A or C can’t be derived from
any other attribute of the relation, so there will be only 1 candidate key {AC}. Step 2. Prime
attributes are those attributes that are part of candidate key {A, C} in this example and others
will be non-prime {B, D, E} in this example. Step 3. The relation R is in 1st normal form as a
relational DBMS does not allow multi-valued or composite attribute. The relation is in 2nd
normal form because BC->D is in 2nd normal form (BC is not a proper subset of candidate
key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E is in 2nd normal
form (B is not a proper subset of candidate key AC). The relation is not in 3rd normal form
because in BC->D (neither BC is a super key nor D is a prime attribute) and in B->E (neither
B is a super key nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD
should be super key or RHS should be prime attribute. So the highest normal form of relation
will be 2nd Normal form.
Example 2 –For example consider relation R(A, B, C) A -> BC, B -> A and B both are super
keys so above relation is in BCNF.
Data consistency: Normal forms ensure that data is consistent and does not contain any redundant
information. This helps to prevent inconsistencies and errors in the database.
Data redundancy: Normal forms minimize data redundancy by organizing data into tables that contain only
unique data. This reduces the amount of storage space required for the database and makes it easier to
manage.
Query performance: Normal forms can improve query performance by reducing the number of joins
required to retrieve data. This helps to speed up query processing and improve overall system performance.
Database maintenance: Normal forms make it easier to maintain the database by reducing the amount of
redundant data that needs to be updated, deleted, or modified. This helps to improve database management
and reduce the risk of errors or inconsistencies.
Database design: Normal forms provide guidelines for designing databases that are efficient, flexible, and
scalable. This helps to ensure that the database can be easily modified, updated, or expanded as needed.
1. BCNF is free from redundancy.
2. If a relation is in BCNF, then 3NF is also satisfied.
3. If all attributes of relation are prime attribute, then the relation is always in 3NF.
4. A relation in a Relational Database is always and at least in 1NF form.
5. Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
6. If a Relation has only singleton candidate keys( i.e. every candidate key consists of only 1 attribute),
then the Relation is always in 2NF( because no Partial functional dependency possible).
7. Sometimes going for BCNF form may not preserve functional dependency. In that case go for BCNF only
if the lost FD(s) is not required, else normalize till 3NF only.
8. There are many more Normal forms that exist after BCNF, like 4NF and more. But in real world database
systems it’s generally not required to go beyond BCNF.