RDBMS Unit-5
RDBMS Unit-5
UNIT-5
Features
▪ Databases in the collection are logically interrelated with each other. Often, they
represent a single logical database.
▪ Data is physically stored across multiple sites. Data in each site can be managed by a
DBMS independent of the other sites.
▪ The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
Features
▪ It synchronizes the database periodically and provides access mechanisms by the virtue
of which the distribution becomes transparent to the users.
▪ It is used in application areas where large volumes of data are processed and accessed
by numerous users simultaneously.
▪ Database Recovery
Most organizations in the current times are subdivided into multiple units that are
physically distributed over the globe. Each unit requires its own set of local data. Thus, the
overall database of the organization becomes distributed.
The multiple organizational units often need to communicate with each other and share
their data and resources. This demands common databases or replicated databases that
should be used in a synchronized manner.
Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) work
upon diversified systems which may have common data. Distributed database systems
aid both these processing by providing synchronized data.
Database Recovery
One of the common techniques used in DDBMS is replication of data across different
sites. Replication of data automatically helps in data recovery if database in any site is
damaged. Users can access data from other sites while the damaged site is being
reconstructed. Thus, database failure may become almost inconspicuous to users.
Most organizations use a variety of application software each with its specific database
support. DDBMS provides a uniform functionality for using the same data among
different platforms.
▪ Data are located near the greatest demand site. The data in a distributed database
system are dispersed to match business requirements which reduce the cost of data
access.
▪ Faster data access. End users often work with only a locally stored subset of the
company’s data.
▪ Faster data processing. A distributed database system spreads out the systems
workload by processing data at several sites.
▪ Growth facilitation. New sites can be added to the network without affecting the
operations of other sites.
▪ Improved communications. Because local sites are smaller and located closer to
customers, local sites foster better communication among departments and between
customers and company staff.
RDBMS by P Nagaiah Goud
▪ User-friendly interface. PCs and workstations are usually equipped with an easy-to-use
graphical user interface (GUI). The GUI simplifies training and use for end users.
▪ Less danger of a single-point failure. When one of the computers fails, the workload is
picked up by other workstations. Data are also distributed at multiple sites.
▪ Processor independence. The end user is able to access any available copy of the data,
and an end user's request is processed by any processor at the data location.
Disadvantages of DDBMS:
▪ Complexity of management and control. Applications must recognize data location, and
they must be able to stitch together data from various sites. Database administrators
must have the ability to coordinate database activities to prevent database degradation
due to data anomalies.
▪ Security. The probability of security lapses increases when data are located at multiple
sites. The responsibility of data management will be shared by different people at
several sites.
▪ Increased training cost. Training costs are generally higher in a distributed model than
they would be in a centralized model, sometimes even to the extent of offsetting
operational and hardware savings. Costs. Distributed databases require duplicated
infrastructure to operate (physical location, environment, personnel, software,
licensing, etc.)
RDBMS by P Nagaiah Goud
The purpose of data dictionary is to provide relevant information about location and
replication during query processing. It provides the information by assuring that the updates
have been transferred to the desired locations. The data dictionary can be managed at
centralized location or at distributed locations. However, to obtain complete data dictionary, all
the distributed subsets of the dictionary must be integrated. DDBMS provides a provision of
user interaction. This interaction is done by means of transactions. Transaction is the
mechanism of executing programs. DDBMS transactions consist of multiple processes. Each of
which is controlled by independent software modules.
an agent is to access only the data which is under the control of its local data management
software.
Basically, the execution of a transaction requires an initiating agent which can activate
agents of other sites by means of a request sent so as to get access over the required data.
Once the agent gets activated, they can interact with each other by means of message
exchange mechanism. This interaction requires cooperation of two or more agents.
To access records, transaction issues read and write operations. The sites that take part
in the DBMS can run one or more software modules such as a Transaction Manager(TM), a Data
Manager (DM) and a scheduler.
The relationships among various software modules involved in DDBMS are as follows.
User processor and Data processor are the two major components of Distributed DBMS
architecture. These major components handle different user requests using several sub-
components in a Peer-to-Peer Distributed DBMS. Those are;
▪ User Processor
▪ Data Processor
User Processor
▪ User interface handler – interpreting user commands when they are given in, and
formatting the result sets when the request is answered.
▪ Semantic data controller – uses the Global Conceptual Schema to check the integrity
constraints defined on database elements and also to check the authorizations on
accessing the requested database.
▪ Global query optimizer and decomposer – devises a best execution strategy to execute
the given user requests in minimal cost (in terms of time, processor, memory). It is like
RDBMS by P Nagaiah Goud
Data Processor
▪ Local query optimizer – it optimizes data access by choosing the best access path. For
example, Local query optimizer decides which index to be used for optimally executing
the given query.
▪ Local recovery manager – deals with the consistency of the local database. In case of
failure, local recovery manager is responsible for maintaining a consistent database.
Advantages:
RDBMS by P Nagaiah Goud
Disadvantages:
• Complex Software: – Complex implementation. Costs more in terms of software cost
compared to a centralized system. Additional software might be needed in most of the
cases over a centralized system.
• Increased Processing overhead: – It costs many messages to be shared between sites to
complete a distributed transaction.
• Data integrity: – Data integrity becomes complex. Too much network resources may be
used.
• Different data formats might be used – This may cost time.
• Deadlock is difficult to handle compared to a centralized system.
• May cause much more network traffic in case of write operation in a replicated form of
distributed database.
• Distributed System supported Operating System is required to implement distributed
database system.
• The data shared between sites over networks are vulnerable to attack. Hence, network-
oriented security protocols to be used based on the sensitivity of data shared.
• More complex in terms database design – According to various applications, we may
need to fragment a database, or replicate a database or both.
• Handling failures is a difficult task. In some cases, we may not distinguish site failure,
network partition, and link failure.
RDBMS by P Nagaiah Goud
• Easy usage of Data: It makes most frequently accessed set of data near to the
user. Hence these data can be accessed easily as and when required by them.
• Efficiency: It in turn increases the efficiency of the query by reducing the size of
the table to smaller subset and making them available with less network access
time.
• Security: It provides security to the data. That means only valid and useful
records will be available to the actual user. The DB near to the user will not have
any unwanted data in their DB. It will contain only that information, which are
necessary for them.
• Parallelism: Fragmentation allows user to access the same table at the same
time from different locations. Users at different locations will be accessing the
same table in the DB at their location, seeing the data that are meant for them. If
they are accessing the table at one location, then they have to wait for the locks
to perform their transactions.
• Reliability: It increases the reliability of fetching the data. If the users are located
at different locations accessing the single DB, then there will be huge network
load. This will not guarantee that correct records are fetched and returned to the
user. Accessing the fragment of data in the nearest DB will reduce the risk of
data loss and correctness of data.
• Balanced Storage: Data will be distributed evenly among the databases in DDB.
Fragmentation of data can be done according to the DBs and user requirement. But while
fragmenting the data, below points should be kept in mind:
• Completeness: While creating the fragment, partial records in the table should
not be considered. Fragmentation should be performed on whole table’s data to
get the correct result.
• Reconstructions: When all the fragments are combined, it should give whole
table’s data. That means whole table should be able to reconstruct using all
fragments.
• Disjointedness: There should not be any overlapping data in the fragments. If so,
it will be difficult to maintain the consistency of the data. Effort needs to be put
to create same replication in all the copies of data.
RDBMS by P Nagaiah Goud
site 2
LBN HTN
Local DBMS Site 1 Local DBMS site 3
Types:
• Horizontal Data Fragmentation
• Vertical Data Fragmentation
• Hybrid Data Fragmentation
Horizontal Data Fragmentation:
• Here the data / records are fragmented horizontally. i.e.; horizontal subset of table data
is created and are stored in different database in DDB.
RDBMS by P Nagaiah Goud
• For example, consider the employees working at different locations of the organization
like India, USA, UK etc. number of employees from all these locations are not a small
number. They are huge in number. When any details of any one employee are required,
whole table needs to be accessed to get the information. Again, the employee table may
present in any location in the world. But the concept of DDB is to place the data in the
nearest DB so that it will be accessed quickly. Hence what we do is divide the entire
employee table data horizontally based on the location.
This is the vertical subset of a relation. That means a relation / table is fragmented by
considering the columns of it.
For example, consider the EMPLOYEE table with ID, Name, Address, Age, location, DeptID,
ProjID. The vertical fragmentation of this table may be dividing the table into different tables
with one or more columns from EMPLOYEE.
This type of fragmentation can be done in any order. It does not have any particular order. It is
solely based on the user requirement. But it should satisfy fragmentation conditions.
RDBMS by P Nagaiah Goud
Synchronous Replication:
In synchronous replication, the replica will be modified immediately after some changes are
made in the relation table. So, there is no difference between original data and replica.
Asynchronous replication:
In asynchronous replication, the replica will be modified after commit is fired on to the
database.
Replication Schemes
LBN HTN
Advantages of no replication
• Concurrency can be minimized.
• Easy recovery of data.
Disadvantages of no replication
• Poor availability of data.
• Slows down the query execution process, as multiple clients are accessing the same
server.
Partial replication
Partial replication means only some fragments are replicated from the database.
• In data replication, storage space required gets higher as the replicas needs more space
going through various sites at a time.
• In data replication, the cost to replicate the data at all sites also gets increased as every
site needs to get updated altogether.
• In data replication, it becomes hard to maintain the consistency of data.
• In data replication, complexity of data increases as well.
5.7 Client Server database and Need for Client Server Computing
Client server database system:
It is a system in which server manages the resources and client consumed these resources. The
client and the server are the logical entities that work together over a network to accomplish a
task.
• Service
• Resource sharing
• Asymmetrical protocols
• Transparency of location
• Inter- Communication via messages
• Encapsulation of services
• Scalability
• Integrity
Transparency of location: The server process can reside in a client or any machine across a
network. In such situations Client/Server software is responsible for implying the server
location by redirecting service calls. Therefore, a program can be a client/server/or both.
Inter- Communication via messages: Interaction between clients and servers is obtained
through a message-passing mechanism mainly to deliver service requests and responses.
Integrity: Since the server code and server data is managed centrally, maintenance cost is
less and results in shared data consistency and undependability of clients.
Client server computing is typically needed in large databases systems because of the
following reasons.
• To handover all the computational responsibilities to server while client can only
generate request for the data.
• To develop a customized platform for specific applications.
• To fasten the processing by providing high speed links that connect hard disks and
processors.
• They are required to develop applications that are based on management
information systems.
• To distribute the workload among client and server.
• 1-Tier Architecture
• 2-Tier Architecture
• 3-Tier Architecture
• N-Tier Architecture
1-Tier Architecture
In the 1-tier architecture, all client/server configuration setting, user interface environment,
data logic, and marketing logic system are existed on the same system. These types of
services are reliable but it is very difficult tasks to handle because they contain all data in
different variance, which are allotted the replication of entire work. This architecture also
contains the different layers
RDBMS by P Nagaiah Goud
For example – Presentation, Business, Data Access layer with using of single software
package. All data is saved on the local machine. Some applications, which manage all three
tiers like as MP3 player, MS Office; but these types of applications are presented under 1-
tier architecture applications.
2-Tier Architecture
2-tier architecture provides the best client/server environment that helps to store user
interface on the client system and all database is saved on the server machine. Business logic
and database logic are existed on the client otherwise server, but they are required to be
maintained. When data logic and business are gathered on the client terminal then it is known
as “fat client thin server architecture”. But if Business Logic and Data Logic are controlled at the
server machine then it is known as “thin client fat server architecture”.
In this architecture, client and server machines are connected directly incorporation because if
client is firing any input for server terminal then in between should not any intermediate. So, it
delivers the output with fastest rate and to ignore misunderstanding between the other
clients. For example – online ticket reservations program, in which 2-tier architecture is used
Benefits :
RDBMS by P Nagaiah Goud
Limitations :
3-Tier Architecture
In this 3-tier architecture, middleware is needed because if client machine sends the request to
server machine, then firstly this request is received by middle layer, and finally this request is
obtained to server. So, firstly response of server is received by middle layer then it is obtained
to client machine. All data logic and business logic are stored on the middleware. Due to use of
middleware, to improve its flexibility and deliver excellent performance.
3-tier architecture is divided into 3 layers such as presentation layer (Client Tier), Application
layer (Business Tier) and Database layer (Data Tier). Client machine handles the presentation
layer, Application layer controls the Application layer, and finally Server machine takes care of
Database layer.
Benefits:
Limitation :
N-Tier Architecture
This architecture is also known as the “Multitier Architecture”, so it is scaled form of 3-tier
architecture. In this architecture, entire presentations, application processing, and data
management functions are isolated from each other.
Benefit
Limitations
It is the collection of interrelated multimedia data that includes text, graphics, images,
animations, video, audio etc. and have vast amounts of multisource multimedia data. The
framework that manages different types of multimedia data which can be stored, delivered and
utilized in different ways is known as multimedia database management system. There are
three classes of the multimedia database which includes
▪ Static media
▪ Dynamic media
▪ Dimensional media
• Media format data – Information such as sampling rate, resolution, encoding scheme
etc. about the format of the media data after it goes through the acquisition, processing
and encoding phase.
• Media keyword data – Keywords description relating to the generation of data. It is also
known as content descriptive data. Example: date, time and place of recording.
RDBMS by P Nagaiah Goud
• Media feature data – Content dependent data such as the distribution of colors, kinds
of texture and different shapes present in data.
▪ Modelling – Working in this area can improve database versus information retrieval
techniques thus, documents constitute a specialized area and deserve special
consideration.
▪ Design – The conceptual, logical and physical design of multimedia databases has not
yet been addressed fully as performance and tuning issues at each level are far more
complex as they consist of a variety of formats like JPEG, GIF, PNG, MPEG which is not
easy to convert from one form to another.
▪ Storage – Storage of multimedia database on any standard disk presents the problem of
representation, compression, mapping to device hierarchies, archiving and buffering
during input-output operation. In DBMS, a ”BLOB”(Binary Large Object) facility allows
untyped bitmaps to be stored and retrieved.
▪ Queries and retrieval –For multimedia data like images, video, audio accessing data
through query opens up many issues like efficient query formulation, query execution
and optimization which need to be worked upon.
▪ Documents and record management: Industries and businesses that keep detailed
records and variety of documents. Example: Insurance claim record.
These databases are separate from the main database and can easily be transported to various
places. Even though they are not connected to the main database, they can still communicate
with the database to share and exchange data.
▪ The main system database that stores all the data and is linked to the mobile database.
▪ The mobile database that allows users to view information even while on the move. It
shares information with the main database.
▪ The device that uses the mobile database to access data. This device can be a mobile
phone, laptop etc.
▪ A communication link that allows the transfer of data between the mobile database and
the main database.
RDBMS by P Nagaiah Goud
▪ The data in a database can be accessed from anywhere using a mobile database. It
provides wireless database access.
▪ The database systems are synchronized using mobile databases and multiple users can
access the data with seamless delivery process.
▪ The mobile database can be synchronized with multiple devices such as mobiles,
computer devices, laptops etc.
▪ The mobile data is less secure than data that is stored in a conventional stationary
database. This presents a security hazard.
▪ The mobile unit that houses a mobile database may frequently lose power because of
limited battery. This should not lead to loss of data in database.
Web databases enable collected data to be organized and cataloged thoroughly within
hundreds of parameters. The Web database does not require advanced computer skills, and
many database software programs provide an easy "click-and-create" style with no complicated
coding. Fill in the fields and save each record. Organize the data however you choose, such as
chronologically, alphabetically or by a specific set of parameters.
Web database software programs are found within desktop publishing programs, such as
Microsoft Office Access and OpenOffice Base. Other programs include the Webex WebOffice
database and FormLogix Web database. The most advanced software applications can set up
data collection forms, polls, feedback forms and present data analysis in real time.
Applicable Uses
Businesses both large and small can use Web databases to create website polls, feedback
forms, client or customer and inventory lists. Personal Web database use can range from
storing personal email accounts to a home inventory to personal website analytics. The Web
database is entirely customizable to an individual's or business's needs.
▪ These are used mostly for OLAP (online analytical processing) and data warehousing.
They can be used to show multiple dimensions of data to users .
▪ The data in multidimensional databases is stored in a data cube format. This means that
data can be seen and understood from many dimensions and perspectives.
RDBMS by P Nagaiah Goud
▪ The revenue costs for a company can be understood and analyzed on the basis of
various factors like the company products, the geographical locations of the company
offices, time to develop a product, promotions done etc.
▪ Increased performance
▪ Easy maintenance
▪ One of the disadvantages of multidimensional databases are that it is quite complex and
it takes professionals to truly understand and analyse the data in the database.
On-Line Transaction Processing (OLTP) System refers to the system that manage transaction-
oriented applications. These systems are designed to support on-line transaction and process
query quickly on the Internet.
For example: POS (point of sale) system of any supermarket is a OLTP System.
RDBMS by P Nagaiah Goud
Every industry in today’s world use OLTP system to record their transactional data. The main
concern of OLTP systems is to enter, store and retrieve the data. They covers all day to day
operations such as purchasing, manufacturing, payroll, accounting, etc.of an organization. Such
systems have large numbers of user which conduct short transaction. It supports simple
database query so the response time of any user action is very fast.
The data acquired through an OLTP system is stored in commercial RDBMS, which can be used
by an OLAP System for data analytics and other business intelligence operations.
Some other examples of OLTP systems include order entry, retail sales, and financial
transaction systems.
• OLTP Systems are user friendly and can be used by anyone having basic understanding
• It allows its user to perform operations like read, write and delete data quickly.
• It responds to its user actions immediately as it can process query very quickly.
• It allows multiple users to access and change the same data at the same time. So it
requires concurrency control and recovery mechanism to avoid any unprecedented
situations
• The data acquired through OLTP systems are not suitable for decision making. OLAP
systems are used for the decision making or “what if” analysis.
OLAP
OLAP stands for Online Analytical Processing Server. It is a software technology that allows
users to analyze information from multiple database systems at the same time. It is based on
multidimensional data model and allows the user to query on multi-dimensional data. OLAP
databases are divided into one or more cubes and these cubes are known as Hyper-cubes.
RDBMS by P Nagaiah Goud
Advantages of OLAP
• OLAP is a platform for all type of business includes planning, budgeting, reporting, and
analysis.
• Information and calculations are consistent in an OLAP cube. This is a crucial benefit.
• Quickly create and analyze “What if” scenarios
• Easily search OLAP database for broad or specific terms.
• OLAP provides the building blocks for business modeling tools, Data mining tools,
performance reporting tools.
• Allows users to do slice and dice cube data all by various dimensions, measures, and
filters.
• It is good for analyzing time series.
• Finding some clusters and outliers is easy with OLAP.
• It is a powerful visualization online analytical process system which provides faster
response times
Disadvantages of OLAP
• OLAP requires organizing data into a star or snowflake schema. These schemas are
complicated to implement and administer
• You cannot have large number of dimensions in a single OLAP cube
• Transactional data cannot be accessed with OLAP system.
• Any modification in an OLAP cube needs a full update of the cube. This is a time-
consuming process
8 It allows only read and hardly write operations. It allows both read and write
operations.
9 Here, the complex queries are involved. Here, the queries are simple.
Parallel Databases
Companies need to handle huge amount of data with high data transfer rate. The client server
and centralized system is not much efficient. The need to improve the efficiency gave birth to
the concept of Parallel Databases.
Parallel database system improves performance of data processing using multiple resources in
parallel, like multiple CPU and disks are used parallelly. It also performs many parallelization
operations like, data loading and query processing.
Improve performance:
The performance of the system can be improved by connecting multiple CPU and disks in
parallel. Many small processors can also be connected in parallel.
Improve reliability:
Reliability of system is improved with completeness, accuracy and availability of data.
Data Warehouse
• A data warehouse is a Relational database that is designed for query and analysis
rather than transaction processing. It includes historical data derived from
transaction data from single and multiple sources.
• A data warehouse provides integrated, enterprise-wide, historical data and
focuses on providing support for decision makers for data modeling and analysis.
• A data warehouse is a group of data specific to the entire organization, not only
to a particular group of users.
• It is not used for daily operations and transaction processing but used for making
decision.
Data Warehouse Features
The key features of a data warehouse are discussed below −
• Subject Oriented − A data warehouse is subject oriented because it provides
information around a subject rather than the organization's ongoing operations.
These subjects can be product, customers, suppliers, sales, revenue, etc. A data
warehouse does not focus on the ongoing operations, rather it focuses on
modelling and analysis of data for decision making.
• Integrated − A data warehouse is constructed by integrating data from
heterogeneous sources such as relational databases, flat files, etc. This
integration enhances the effective analysis of data.
• Time Variant − The data collected in a data warehouse is identified with a
particular time period. The data in a data warehouse provides information from
the historical point of view.
• Non-volatile − Non-volatile means the previous data is not erased when new
data is added to it. A data warehouse is kept separate from the operational
database and therefore frequent changes in operational database is not reflected
in the data warehouse.
RDBMS by P Nagaiah Goud
Applications:
Data warehouses are widely used in the following fields −
• Financial services
• Banking services
• Consumer goods
• Retail sectors
• Controlled manufacturing
Information processing, analytical processing, and data mining are the three types of data
warehouse applications.
• Information Processing − A data warehouse allows to process the data stored in
it. The data can be processed by means of querying, basic statistical analysis,
reporting using crosstabs, tables, charts, or graphs.
• Analytical Processing − A data warehouse supports analytical processing of the
information stored in it. The data can be analyzed by means of basic OLAP
operations, including slice-and-dice, drill down, drill up, and pivoting.
• Data Mining − Data mining supports knowledge discovery by finding hidden
patterns and associations, constructing analytical models, performing
classification and prediction. These mining results can be presented using the
visualization tools.
RDBMS by P Nagaiah Goud
Features of NoSQL
Non-relational
Schema-free
RDBMS by P Nagaiah Goud
Simple API
• Offers easy to use interfaces for storage and querying data provided
• APIs allow low-level data manipulation & selection methods
Distributed
Advantages of NoSQL
Disadvantages of NoSQL
• No standardization rules
• Limited query capabilities
• It does not offer any traditional database capabilities, like consistency when multiple
transactions are performed simultaneously.
• When the volume of data increases it is difficult to maintain unique values as keys
become difficult
• Doesn’t work as well with relational data
• Open-source options so not so popular for enterprises.