Module 1_DBMS (1)
Module 1_DBMS (1)
Module 1_DBMS (1)
What is data?
Data is a collection of a distinct small unit of information. It can be used in a variety of forms like text,
numbers, media, bytes, etc. it can be stored in pieces of paper or electronic memory, etc.Word 'Data'
originates from the word 'datum' that means 'single piece of information.' It is plural of the word datum.In
computing, Data is information that can be translated into a form for efficient movement and processing.
Data is interchangeable.
What is a database?
A database is an organized collection of data, so that it can be easily accessed and managed. You can
organize data into tables, rows, columns, and index it to make it easier to find relevant information. The
main purpose of the database is to operate a large amount of information by storing, retrieving, and
managing data.
The management system is important because without the existence of some kind of rules and regulations
it is not possible to maintain the database. We have to select the particular attributes which should be
included in a particular table; the common attributes to create a relationship between two tables; if a new
record has to be inserted or deleted then which tables should have to be handled etc. These issues must be
resolved by having some kind of rules to follow in order to maintain the integrity of the database.
Database systems are designed to manage large bodies of information. Management of data involves both
defining structures for storage of information and providing mechanisms for the manipulation of
information. In addition, the database system must ensure the safety of the information stored, despite
system crashes or attempts at unauthorized access. If data are to be shared among several users, the
system must avoid possible anomalous results. Because information is so important in most organizations,
computer scientists have developed a large body of concepts and techniques for managing data.
Advantages of DBMS:
● Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e storing the
same data multiple times). In a database system, by having a centralized database and centralized
control of data by the DBA the unnecessary duplication of data is avoided. It also eliminates the
extra time for processing the large volume of data. It results in saving the storage space.
● Improved Data Sharing : DBMS allows a user to share the data in any number of application
programs.
● Data Integrity : Integrity means that the data in the database is accurate. Centralized control of
the data helps in permitting the administrator to define integrity constraints to the data in the
database. For example: in customer databases we can enforce an integrity that must accept the
customer only from Noida and Meerut city.
● Security : Having complete authority over the operational data, enables the DBA in ensuring that
the only means of access to the database is through proper channels. The DBA can define
authorization checks to be carried out whenever access to sensitive data is attempted.
● Data Consistency : By eliminating data redundancy, we greatly reduce the opportunities for
inconsistency. For example: if a customer address is stored only once, we cannot have
disagreement on the stored values. Also updating data values is greatly simplified when each
value is stored in one place only. Finally, we avoid the wasted storage that results from redundant
data storage.
● Efficient Data Access : In a database system, the data is managed by the DBMS and all access to
the data is through the DBMS providing a key to effective data processing
● Enforcements of Standards : With the centralized of data, DBA can establish and enforce the
data standards which may include the naming conventions, data quality standards etc.
● Data Independence : Ina database system, the database management system provides the
interface between the application programs and the data. When changes are made to the data
representation, the meta data obtained by the DBMS is changed but the DBMS continues to
provide the data to the application program in the previously used way. The DBMs handles the
task of transformation of data wherever necessary.
● Reduced Application Development and Maintenance Time : DBMS supports many important
functions that are common to many applications, accessing data stored in the DBMS, which
facilitates the quick development of applications.
Database Architecture:
The architecture of a database system is greatly influenced by the underlying computer system on which
the database system runs. Database systems can be centralized, or client-server, where one server machine
executes work on behalf of multiple client machines. Database systems can also be designed to exploit
parallel computer architectures. Distributed databases span multiple geographically separated machines
A database system is partitioned into modules that deal with each of the responsibilities of the overall
system. The functional components of a database system can be broadly divided into the storage manager
and the query processor components. The storage manager is important because databases typically
require a large amount of storage space. The query processor is important because it helps the database
system simplify and facilitate access to data.
Database applications are usually partitioned into two or three parts, as in Figure.In a two-tier
architecture, the application resides at the client machine, where it invokes database system functionality
at the server machine through query language statements. Application program interface standards like
ODBC and JDBC are used for interaction between the client and the server. In contrast, in a three-tier
architecture, the client machine acts as merely a front end and does not contain any direct database calls.
Instead, the client communicates with an application server, usually through a forms interface.
The application server in turn communicates with a database system to access data. The business logic of
the application, which says what actions to carry out under what conditions, is embedded in the
application server, instead of being distributed across multiple clients. Three-tier applications are more
appropriate for large applications, and for applications that run on the World Wide Web.
Two-tier architecture Three-tier architecture
● Authorization and integrity manager, which tests for the satisfaction of integrity
constraints and checks the authority of users to accessdata.·
● Transaction manager, which ensures that the database remains in a consistent (correct)
state despite system failures, and that concurrent transaction executions proceed without
conflicting.·
● File manager, which manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.·
● Buffer manager, which is responsible for fetching data from disk storage into main
memory, and deciding what data to cache in main memory. The buffer manager is a
critical part of the database system, since it enables the database to handle data sizes that
are much larger than the size of main memory
Transaction Manager:A transaction is a collection of operations that performs a single logical function
in a database application. Each transaction is a unit of both atomicity and consistency. Thus, we require
that transactions do not violate any database-consistency constraints. That is, if the database was
consistent when a transaction started, the database must be consistent when the transaction successfully
terminates. Transaction -manager ensures that the database remains in a consistent (correct) state despite
system failures (e.g., power failures and operating system crashes) and transaction failures.
Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to
use complex data structures to represent data in the database. Since many database-system users are not
computer trained, developers hide the complexity from users through several levels of abstraction, to
simplify users’ interactions with the system:
● Physical level (or Internal View / Schema): The lowest level of abstraction describes how the
data are actually stored. The physical level describes complex low-level data structures in detail.
● Logical level (or Conceptual View / Schema): The next-higher level of abstraction describes
what data are stored in the database, and what relationships exist among those data. The logical
level thus describes the entire database in terms of a small number of relatively simple structures.
Although implementation of the simple structures at the logical level may involve complex
physical-level structures, the user of the logical level does not need to be aware of this
complexity. This is referred to as physical data independence. Database administrators, who must
decide what information to keep in the database,use the logical level of abstraction.
● View level (or External View / Schema): The highest level of abstraction describes only part of
the entire database. Even though the logical level uses simpler structures, complexity remains
because of the variety of information stored in a large database.Many users of the database
system do not need all this information; instead, they need to access only a part of the database.
The view level of abstraction exists to simplify their interaction with the system. The system may
provide many views for the same database. Figure shows the relationship among the three levels
of abstraction.
Data Independence
Data independence is the ability to modify the scheme without affecting the programs and the application
to be rewritten. Data is separated from the programs, so that the changes made to the data will not affect
the program execution and the application.
We know the main purpose of the three levels of data abstraction is to achieve data independence. If the
database changes and expands over time, it is very important that the changes in one level should not
affect the data at other levels of the database. This would save time and cost required when changing the
database.
There are two levels of data independence based on three levels of abstraction. These are as follows −
Physical Data Independence means changing the physical level without affecting the logical level or
conceptual level. Using this property, we can change the storage device of the database without affecting
the logical schema.
The changes in the physical level may include changes using the following −
Logical view of data is the user view of the data. It presents data in the form that can be accessed by the
end users.
Codd’s Rule of Logical Data Independence says that users should be able to manipulate the Logical View
of data without any information of its physical storage. Software or the computer program is used to
manipulate the logical view of the data. Database administrator is the one who decides what information
is to be kept in the database and how to use the logical level of abstraction. It provides the global view of
Data. It also describes what data is to be stored in the database along with the relationship.
The data independence provides the database in simple structure. It is based on application domain
entities to provide the functional requirement. It provides abstraction of system functional requirements.
Static structure for the logical view is defined in the class object diagrams. Users cannot manipulate the
logical structure of the database.
● Procedural DMLs require a user to specify what data is needed and how to get that data.
● Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data
is needed without specifying how to get those data. Declarative DMLs are usually easier to learn
and use than are procedural DMLs. However, since a user does not have to specify how to get the
data, the database system has to figure out an efficient means of accessing data. A query is a
statement requesting the retrieval of information.
The data values stored in the database must satisfy certain consistency constraints.For example, suppose
the university requires that the account balance of a department must never be negative. The DDL
provides facilities to specify such constraints. The database system checks these constraints every time the
database is updated. In general, a constraint can be an arbitrary predicate pertaining to the database.
However, arbitrary predicates may be costly to test. Thus, database systems implement integrity
constraints that can be tested with minimal overhead.
Data Models
Data models define how the logical structure of a database is modeled. Data Models are fundamental
entities to introduce abstraction in a DBMS. Data models define how data is connected to each other and
how they are processed and stored inside the system.
The very first data model could be flat data-models, where all the data used are to be kept in the same
plane. Earlier data models were not so scientific, hence they were prone to introduce lots of duplication
and update anomalies.
1. Entity-Relationship Model
Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships among
them. While formulating real-world scenarios into the database model, the ER Model creates entity set,
relationship set, general attributes and constraints. ER Model is best used for the conceptual design of a
database.
ER Model is based on −
● Entity − An entity in an ER Model is a real-world entity having properties called attributes. Every
attribute is defined by its set of values called domain. For example, in a school database, a student
is considered as an entity. Students have various attributes like name, age, class, etc.
● Relationship − The logical association among entities is called relationship. Relationships are
mapped with entities in various ways. Mapping cardinalities define the number of associations
between two entities.
● Mapping cardinalities −
○ one to one
○ one to many
○ many to one
○ many to many
2. Relational Model
The most popular data model in DBMS is the Relational Model. It is more scientific a model than
others. This model is based on first-order predicate logic and defines a table as an n-ary relation.
The main highlights of this model are −
3. Network Model
The network model was created to represent complex data relationships more effectively when compared
to hierarchical models, to improve database performance and standards.
It has entities which are organized in a graphical representation and some entities are accessed through
several paths. A User perceives the network model as a collection of records in 1:M relationships.
● Ability to Merge Relationships − In this model, because of more relationships the data is more
related. It has an ability to manage one-to-one relationships as well as many-to-many
relationships.
● Many paths − There can be more than one path to the same record because of more relationships.
It makes data access fast and simple.
● Circular Linked List − The operations in this model are done with the help of the circular linked
list. The current position is maintained with the help of a program and navigates through the
records based on relationships.
● Objects: The real world entities and situations are represented as objects in the Object oriented
database model.
● Attributes and Method: Every object has certain characteristics. These are represented using
Attributes. The behavior of the objects is represented using Methods.
● Class: Similar attributes and methods are grouped together using a class. An object can be called
as an instance of the class.
● Inheritance: A new class can be derived from the original class. The derived class contains
attributes and methods of the original class as well as its own.
Integrity Constraints
● Integrity constraints are a set of rules. It is used to maintain the quality of information.
● Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.
● Thus, integrity constraint is used to guard against accidental damage to the database.
1. Domain constraints
● Domain constraints can be defined as the definition of a valid set of values for an attribute.
● The data type of domain includes string, character, integer, time, date, currency, etc. The value of
the attribute must be available in the corresponding domain.
2. Entity integrity constraints
● The entity integrity constraint states that primary key value can't be null.
● This is because the primary key value is used to identify individual rows in relation and if the
primary key has a null value, then we can't identify those rows.
● A table can contain a null value other than the primary key field.
● Keys are the entity set that is used to identify an entity within its entity set uniquely.
● An entity set can have multiple keys, but out of which one key will be the primary key. A primary
key can contain a unique and null value in the relational table.