[go: up one dir, main page]

0% found this document useful (0 votes)
63 views26 pages

Unit 3 Data Warehousing and OLAP

The document provides an overview of data warehousing, including its definition, benefits, and disadvantages. It discusses various applications across sectors like banking, finance, and healthcare, and outlines the characteristics and architecture of data warehouses. Additionally, it compares operational databases with data warehouses and explains different types of data marts.

Uploaded by

donmanish98072
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views26 pages

Unit 3 Data Warehousing and OLAP

The document provides an overview of data warehousing, including its definition, benefits, and disadvantages. It discusses various applications across sectors like banking, finance, and healthcare, and outlines the characteristics and architecture of data warehouses. Additionally, it compares operational databases with data warehouses and explains different types of data marts.

Uploaded by

donmanish98072
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

1

3. Data Warehousing and Online Analytical Processing


3.1. Basic Concepts of Data Warehousing
A data warehouse can be defined as a collection of organizational data and information extracted from
operational sources and external data sources. The data is periodically pulled from various internal
applications like sales, marketing, and finance; customer-interface applications; as well as external
partner systems. This data is then made available for decision-makers to access and analyze.

3.2. Use and Benefits of Data Warehousing


Intelligent Decision-Making: With centralized data in warehouses, decisions may be made more
quickly and intelligently.
Business Intelligence: Provides strong operational insights through business intelligence.
Data Quality: Guarantees data quality and consistency for trustworthy reporting.
Scalability: Capable of managing massive data volumes and expanding to meet changing
requirements.
Effective Queries: Fast and effective data retrieval is made possible by an optimized structure.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


2

Cost reductions: Data warehousing can result in cost savings over time by reducing data management
procedures and increasing overall efficiency, even when there are setup costs initially.
Data security: Data warehouses employ security protocols to safeguard confidential information,
guaranteeing that only authorized personnel are granted access to certain data.
Faster Queries: The data warehouse is designed to handle large queries that’s why it runs queries
faster than the database..
Historical Insight: The warehouse stores all your historical data which contains details about the
business so that one can analyze it at any time and extract insights from it.
Disadvantages of Data Warehousing
Cost: Building a data warehouse can be expensive, requiring significant investments in hardware,
software, and personnel.
Complexity: Data warehousing can be complex, and businesses may need to hire specialized personnel
to manage the system.
Time-consuming: Building a data warehouse can take a significant amount of time, requiring
businesses to be patient and committed to the process.
Data integration challenges: Data from different sources can be challenging to integrate, requiring
significant effort to ensure consistency and accuracy.
Data security: Data warehousing can pose data security risks, and businesses must take measures to
protect sensitive data from unauthorized access or breaches.

3.3. Application of Data Warehousing


Banking
• Identify the potential risk of default and manage and control collections
• Performance analysis of each product, service, interchange, and exchange rates
• Track performance of accounts and user data
• Provide feedback to bankers regarding customer relationships and profitability

Finance
• Evaluation of customer expenses trends
• Maintain transparency in transactions
• Predict/spot defaulters and act accordingly
• Analyze and forecast different aspects of business, stock, and bond performance

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


3

Government
• Maintain and analyze tax records, health policy records, and their respective providers
• Prediction of criminal activities from patterns and trends
• Searching terrorist profile
• Threat assessment and fraud detection

Education
• Store and analyze information about faculty and students
• Maintain student portals to facilitate student activities
• Extract information for research grants and assess student demographics
• Integrate information from different sources into a single repository for analysis and strategic
decision-making

Healthcare
• Generate patient, employee, and financial records
• Share data with other entities, like insurance companies, NGOs, and medical aid services
• Use data mining to identify patient trends
• Provide feedback to physicians on procedures and tests

Insurance
• Analyze data patterns and customer trends – Maintain records of all internal and external
sources, including existing participants
• Design customized offers and promotions for customers
• Predict and analyze changes in the industry

Manufacturing
• Predict market changes and analyze current business trends
• Analyze previous and current market data
• Track customer feedback and identify opportunities for improvement
• Gather, standardize, and store data from various internal and external sources
• Identify profitable product lines and required product features

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


4

Retail
• Maintain records of producers and consumers
• Track items, their promotion strategies, and consumer buying trends (trend analysis)
• Analyze sales to determine shelf space
• Understanding the patterns of complaints, claims, and returns

Services
• Maintenance of financial and employee records
• Customer profiling and screening
• Resource allocation and management
• Revenue patterns and profitability

3.4. Characteristics of Data Warehouse


Key Characteristics of Data Warehouse
Subject-Oriented: A data warehouse is subject-oriented since it provides topic-wise information rather
than the overall processes of a business. Such subjects may be sales, promotion, inventory, etc. For
example, if you want to analyze your company’s sales data, you need to build a data warehouse that
concentrates on sales. Such a warehouse would provide valuable information like ‘who was your best
customer last year?’ or ‘who is likely to be your best customer in the coming year?’
Integrated: A data warehouse is developed by integrating data from varied sources into a consistent
format. The data must be stored in the warehouse in a consistent and universally acceptable manner in
terms of naming, format, and coding. This facilitates effective data analysis.
Non-Volatile: Data once entered into a data warehouse must remain unchanged. All data is read-only.
Previous data is not erased when current data is entered. This helps you to analyze what has happened
and when.
Time-Variant: The data stored in a data warehouse is documented with an element of time, either
explicitly or implicitly. An example of time variance in Data Warehouse is exhibited in the Primary
Key, which must have an element of time like the day, week, or month.

3.5. Operational Database Vs. Data Warehouse


Here are some key characteristics of an operational database:
Real-time processing: Handles live data updates to support business operations instantly.
High availability: Designed for minimal downtime to ensure continuous service.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


5

Transactional integrity: Enforces ACID (atomicity, consistency, isolation, durability) properties for
reliable transaction processing.
Row-oriented storage: Stores data in rows, optimal for frequent insert, update, and delete operations.
Normalized data structure: Organizes data into smaller, related tables to eliminate redundancy and
maintain efficiency.
Support for OLTP (online transaction processing): Prioritizes rapid execution of simple queries and
updates over complex analytics.
Here are some key characteristics of a data warehouse:

Analytical focus: Built to perform OLAP (online analytical processing) tasks like trend analysis and
reporting.
Historical data storage: Consolidates and organizes historical data from multiple sources for long-
term use.
Column-oriented storage: Structures data for efficient retrieval in analytical queries, often using
columnar formats.
Denormalized data structure: Optimized for read-heavy operations by minimizing joins, using star or
snowflake schemas.
Batch processing: Updates data in bulk during scheduled intervals rather than real time.
Scalability: Handles growing volumes of data without compromising query performance.

Key differences between operational databases and data warehouses


Although both operational databases and data warehouses are essential to modern data strategies, they
serve distinct purposes and operate differently in several key areas.
Operational focus
Operational databases are built for OLTP (online transaction processing) tasks. They manage real-time
transactional data, ensuring quick read/write operations to support daily business activities. On the
other hand, data warehouses are tailored for OLAP (online analytical processing) tasks, specializing in
analyzing large datasets and supporting complex queries for decision-making.
Data types
While operational databases deal primarily with current, live data required for immediate operations,
data warehouses store historical data aggregated over time, providing the foundation for trend analysis
and long-term strategic insights.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


6

Performance and speed


Operational databases prioritize quick transactions to handle frequent updates or retrievals efficiently.
In contrast, data warehouses optimize performance for analytical queries, ensuring high-speed analysis
of large datasets, even if updates occur less frequently.
Data architecture
The architecture of an operational database typically uses row-oriented storage, which is ideal for
transactional operations. Data warehouses employ column-oriented storage, which accelerates the
performance of analytical queries by reducing the amount of data scanned.
Data models
Operational databases often follow an entity-relationship model, which emphasizes the relationships
between entities for efficient transaction management. Data warehouses use star or snowflake schemas,
designed to simplify analytical tasks by organizing facts and dimensions.
Data structure
Data in operational databases is normalized, meaning it’s broken down into smaller, related tables to
eliminate redundancy. In contrast, data warehouses use a denormalized structure, consolidating data
into fewer tables to reduce joins and improve query performance.
Users
Operational databases serve front-line staff such as customer service teams or sales representatives who
need access to real-time data. Data warehouses, however, cater to analysts and decision-makers who
require historical data for reporting, business intelligence, and strategic planning.

3.6. Data Warehouse Architecture


Three-Tier Data Warehouse Architecture
Generally a data warehouses adopts a three-tier architecture. Following are the three tiers of the data
warehouse architecture.
Bottom Tier − The bottom tier of the architecture is the data warehouse database server. It is the
relational database system. We use the back end tools and utilities to feed data into the bottom tier.
These back end tools and utilities perform the Extract, Clean, Load, and refresh functions.
Middle Tier − In the middle tier, we have the OLAP Server that can be implemented in either of the
following ways.
By Relational OLAP (ROLAP), which is an extended relational database management system. The
ROLAP maps the operations on multidimensional data to standard relational operations.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


7

By Multidimensional OLAP (MOLAP) model, which directly implements the multidimensional data
and operations.
Top-Tier − This tier is the front-end client layer. This layer holds the query tools and reporting tools,
analysis tools and data mining tools.

3.7. Data Warehouse Models

Enterprise Warehouse
An EDW is a data warehouse that encompasses and stores all of an organization’s data from sources
across the entire business. A smaller data warehouse may be specific to a business department or line of
business (like a data mart). In contrast, an EDW is intended to be a single repository for all of an
organization’s data.
Data Sources: These encompass data collected from various operational and transactional systems
within the organization, such as Enterprise Resource Planning (ERP) systems, Customer Relationship
Management (CRM) platforms, finance applications, Internet of Things (IoT) devices, and mobile and
online systems.
Staging Area: This serves as an intermediate space where data is aggregated, cleaned, and prepared
before being loaded into the EDW. The staging area ensures that data is in a consistent and usable
format, ready for further processing and analysis.
Storage layer: The data is finally loaded into the storage space. With the ELT approach, it might still
take some transformation here. But, at that stage, all the general changes will be applied, so the data
will be loaded into its final model(s). As we mentioned, data warehouses are most often relational
databases. DW will also include a database management system and additional storage for metadata.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


8

Presentation or Access Space: This component provides an interface for users to access and interact
with the data stored in the EDW. It enables analytics, querying, reporting, and data sharing, facilitating
efficient decision-making and strategic insights across the organization.

Data marts
A data mart is a data storage system that contains information specific to an organization's business
unit. It contains a small and selected part of the data that the company stores in a larger storage system.
Companies use a data mart to analyze department-specific information more efficiently. It provides
summarized data that key stakeholders can use to quickly make informed decisions.
For example, a company might store data from various sources, such as supplier information, orders,
sensor data, employee information, and financial records in their data warehouse or data lake.
However, the company stores information relevant to, for instance, the marketing department, such as
social media reviews and customer records, in a data mart.

Data mart vs. database


A data mart serves as the front-facing element for a department’s data. You can use a data mart to
retrieve and analyze information. Meanwhile, a database collects, manages, and stores information. You
can then use tools to process, format, and transfer the stored information to a data mart.
Why is a data mart important?
These are some good reasons that companies might use a data mart.
Retrieve data more efficiently

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


9

By using a data mart, companies can access specific information more efficiently. Compared to a data
warehouse, a data mart contains relevant and detailed information that a department accesses
frequently. Therefore, business managers don’t need to search the entire data warehouse to generate
performance reports or graphics.
Streamline decision-making
Companies can create a subset of data from a data warehouse with a data mart. Employees within the
department can then analyze the data and make decisions based on the same set of information.
Control information more effectively
A data mart gives employees highly granular access privileges. This means the company can authorize
a certain person to view or retrieve specific data. It helps companies to improve data governance and
enforce information access policies. For example, you can use data marts to provide user access to
employees for specific information in a data warehouse.
Manage data flexibly
A data mart is smaller and contains fewer tables than a data warehouse. This means data engineers can
manage and change information in a data mart without causing major database changes.

What are the types of data marts?


These are the different types of data marts.

Dependent data mart


A dependent data mart populates its storage with a subset of information from a centralized data
warehouse. The data warehouse gathers all the information from data sources. Then, the data mart
queries and retrieves subject-specific information from the data warehouse.

Pros and cons


Most data management and
administration works are
performed in the data
warehouse. This means that
business analysts do not
need to be highly skilled in
database management to
use information from the
data mart. Although
dependent data marts make

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


10

retrieving information much easier, they present a single point of failure. If the data warehouse fails, all
the connected data marts will also fail.

Independent data mart


An independent data mart does not rely on a central data warehouse or any other data mart. Each data
mart collects information from its sources instead of from a data warehouse. Independent data marts are
suitable for smaller companies, but only specific departments need to access and analyze information.

Pros and cons


Companies can set up independent data
marts with relative ease. However,
managing them might be difficult. This is
because business analysts need to perform
database administrative work at each data
mart. It is straightforward to share data
between different data marts using strategies
like data sharing; departments can read
another department’s data and even augment
it with their own data. However, a strong data cataloging strategy must be put into place to ensure each
department knows what they are looking at.

Hybrid data mart


Hybrid data marts collect information from a data warehouse and from external sources. This allows
companies the flexibility to test independent data sources before they direct the data to the data
warehouse.
For example, suppose you launch a
new product and want to analyze its
initial sales data. The data mart uses
sales information that comes directly
from the e-commerce software and
retrieves sales records for other
products from the data mart. After the
product becomes a permanent fixture
in your store, you channel the
transaction details to the data
warehouse.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


11

3.9. Fact tables and Dimensions Tables


What is Fact Table?
A fact table is a table in a data warehouse schema that stores quantitative data about a business process
for analysis, such as orders, product inventory and financial revenue data. Data inside a fact table
provides an overview of a business process, which is made up of both original numerical records and
records linked from dimension tables to detail the process. A fact table is the central table in star
schema or snowflake schema, and tends to be the largest table of the schema. Both star and snowflake
schemas typically have one fact table.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


12

In a fact table, each row corresponds to one event or measurement, with some rows being associated
with a dimension table in the schema. Every row referencing a dimension table has its own foreign key,
which links the data in the fact table to the referenced dimension table. The foreign keys contained in
the fact table are also the primary keys of the referenced dimension tables.

Characteristics of a Fact Table


• Contains quantitative data about the events or measurements of a business process.
• Holds foreign keys (the primary keys) of referenced dimension tables, plus its own primary key.
• Often one, central table of star schema or snowflake schema.
• Often the largest table in star schema or snowflake schema.

What Is a Dimension Table?


A dimension table is a table in a data warehouse schema that stores qualitative data or descriptive
attributes for related data in the fact table, such as customer information, store location or product
name. Data inside a dimension table represent the different aspects of a business process, and is linked
alongside the events or measurements recorded in the fact table to help provide context. There are
typically multiple dimension tables that surround the central fact table in star schema or snowflake
schema.
Each dimension table in a schema holds its own primary key, which is used to identify an associated
record in the fact table. A dimension table’s primary key is known as a foreign key when inside the fact
table.

Characteristics of a Dimension Table


• Contains qualitative, descriptive data about the numerical data in a fact table.
• Holds its own primary key.
• Often multiple tables that surround the fact table in star schema or snowflake schema.
• Often smaller in size than the fact table in star schema or snowflake schema.

Fact Table vs. Dimension Table


Fact table: A fact table contains the primary keys of the referenced dimension tables along with some
quantitative metrics. Examples of a fact table include customer orders or time-series financial data.
Dimension table: A dimension table holds the descriptive information for the related fields that are in
the fact table’s records. It typically represents a physical entity like “customer” or “product.”
Dimension tables, which are in blue, correspond to the tables containing information about the
“Customers,” “Stores,” “Products” and “Dates.” These are the nouns of the business case.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


13

The fact table shown in orange contains all the primary keys (PK) of the dimension tables, which are
the foreign keys (FK) in the fact table, along with two quantitative fields — namely quantity and
amount.

Parameter Fact Table Dimension Table

Definition Facts about a business process, Descriptive characteristics in the


such as measurements or companion table to the fact table can be
metrics. utilized as query constraints.

Characteristic Positioned in the middle of a The edges of the snowflake or star schema,
snowflake or star schema, attached to the fact table,
surrounded by dimensions.

Design Defined by their grain or at the It must be extensive, in-depth, and of the
atomic level. highest calibre.

Task A fact table is a quantifiable Gathering of background data about a


event for which data from a company.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


14

dimension table is gathered and


used for reporting and analysis.

Data Type Facts tables may include data Each dimension table has attributes that
about sales in relation to a provide information about the specifics of
number of parameters, such as the dimension. For instance, product
Product and Date. dimensions may include the product ID,
category, etc.

Key The fact table's primary key is Each dimension in a dimension table
mapped as a foreign key to contains a primary key column that
dimensions. uniquely identifies it.

Storage Helps to save report labels and Load dimensional structures with thorough
filter domain values in atomic data.
dimension tables.

Hierarchy Contains no hierarchy. Hierarchies are present. For instance,


Location could include a country, state,
city, zip code, and more.

3.10. Data Warehouse Schemas


Schema is a logical description of the entire database. It includes the name and description of records of
all record types including all associated data-items and aggregates. Much like a database, a data
warehouse also requires to maintain a schema. A data warehouse uses Star, Snowflake, and Fact
Constellation schema.

3.10.1. Star Schema


Each dimension in a star schema is represented with only one-dimension table. This dimension table
contains the set of attributes. There is a fact table at the center. It contains the keys to each of four
dimensions.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


15

The following diagram shows the sales data of a company with respect to the four dimensions, namely
time, item, branch, and location.

The fact table also contains the attributes, namely dollars sold and units sold.
Note − Each dimension has only one dimension table and each table holds a set of attributes.
For example, the location dimension table contains the attribute set
{location_key, street, city, province_or_state,country}.

This constraint may cause data redundancy. For example, "Vancouver" and "Victoria" both the cities
are in the Canadian province of British Columbia. The entries for such cities may cause data
redundancy along the attributes province_or_state and country.

Advantages of Star Schema


• Simpler Queries: Join logic of star schema is quite cinch in comparison to other join logic
which are needed to fetch data from a transactional schema that is highly normalized.
• Simplified Business Reporting Logic: In comparison to a transactional schema that is highly
normalized, the star schema makes simpler common business reporting logic, such as of
reporting and period-over-period.
• Feeding Cubes: Star schema is widely used by all OLAP systems to design OLAP cubes
efficiently. In fact, major OLAP systems deliver a ROLAP mode of operation which can use a
star schema as a source without designing a cube structure.

Disadvantages of Star Schema


• Data integrity is not enforced well since in a highly de-normalized schema state.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


16

• Not flexible in terms if analytical needs as a normalized data model.


• Star schema doesn’t reinforce many-to-many relationships within business entities – at least not
frequently.

3.10.2. Snowflake Schema


Some dimension tables in the Snowflake schema are normalized. The normalization splits up the data
into additional tables. Unlike Star schema, the dimensions table in a snowflake schema are normalized.
For example, the item dimension table in star schema is normalized and split into two dimension tables,
namely item and supplier table.

Now the item dimension table contains the attributes item_key, item_name, type, brand, and supplier-
key. The supplier key is linked to the supplier dimension table. The supplier dimension table contains
the attributes supplier_key and supplier_type.
Due to normalization in the Snowflake schema, the redundancy is reduced and therefore, it becomes
easy to maintain and the save storage space.

Advantages of Snowflake Schema


• It provides structured data which reduces the problem of data integrity.
• It uses small disk space because data are highly structured.

Disadvantages of Snowflake Schema


• Do not snowflake hierarchies of dimension table into separate tables. Hierarchies should belong
to the dimension table only and should never be snowflakes.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


17

• Multiple hierarchies that can belong to the same dimension have been designed at the lowest
possible detail.

Comparison between Star and Snowflake Schema

Star Schema Snowflake Schema

In star schema, The fact tables and While in snowflake schema, The fact tables, dimension
the dimension tables are contained. tables as well as sub dimension tables are contained.

Star schema is a top-down model. While it is a bottom-up model.

Star schema uses more space. While it uses less space.

It takes less time for the execution of While it takes more time than star schema for the
queries. execution of queries.

In star schema, Normalization is not While in this, Both normalization and denormalization
used. are used.

It’s design is very simple. While it’s design is complex.

The query complexity of star schema While the query complexity of snowflake schema is
is low. higher than star schema.

It’s understanding is very simple. While it’s understanding is difficult.

It has less number of foreign keys. While it has more number of foreign keys.

It has high data redundancy. While it has low data redundancy.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


18

3.10.3. Fact Constellation Schema


A fact constellation has multiple fact tables. It is also known as galaxy schema.
The following diagram shows two fact tables, namely sales and shipping.

The sales fact table is same as that in the star schema. The shipping fact table has the five dimensions,
namely item_key, time_key, shipper_key, from_location, to_location. The shipping fact table also
contains two measures, namely dollars sold and units sold.
It is also possible to share dimension tables between fact tables. For example, time, item, and location
dimension tables are shared between the sales and shipping fact table.

Advantages
• Tables are subdivided into fact and dimensional to understand the relationship between them.
• It is a flexible schema that makes users use it.
• Here dimensional tables are shared by the number of fact tables.
• It is a normalized form of snowflake and star schema.
• We can access the data in the database using complex queries.

Disadvantages
• It is difficult to understand as it is a very complex schema to implement.
• It uses more space in the database comparative to the star schema.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


19

• It has many joins between dimensional and fact tables and thus it is difficult to understand.
• This is difficult to maintain and operate.

3.11. OLAP Operations in Multidimensional Data Models


OLAP provides various operations to gain insights from the data stored in multidimensional
hypercubes. These operations include:

Roll-up
It is the opposite of the drill-down operation and is also known as a drill-up or aggregation operation. It
is a dimension-reduction technique that performs aggregation on a data cube. It makes the data less
detailed and it can be performed by combining similar dimensions across any axis.
Example: Considering the above-mentioned clothing company sales example:

Here, we are performing the Roll-up operation on the given data cube by combining and categorizing
the sales based on the countries instead of cities.

Drill-Down
Drill down operation allows a user to zoom in on the data cube i.e., the less detailed data is converted
into highly detailed data. It can be implemented by either stepping down a concept hierarchy for a
dimension or adding additional dimensions to the hypercube.
Example: Consider a cube that represents the annual sales (4 Quarters: Q1, Q2, Q3, Q4) of various
kinds of clothes (Shirt, Pant, Shorts, Tees) of a company in 4 cities (Delhi, Mumbai, Las Vegas, New
York) as shown below.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


20

In the figure, the drill-down operation is applied on the time dimension and the quarter Q1 is drilled
down to January, February, and March. Hence, by applying the drill-down operation, we can move
down from quarterly sales in a year to monthly or weekly records.

Slice
Slice operation is used to select a single dimension from the given cube to generate a new sub-cube. It
represents the information from another point of view.
Example: Considering our clothing company sales example:

Here, the sales done by the company during the first quarter are retrieved by performing the slice
operation on the given hypercube.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


21

Dice
Dice operation is used to generate a new sub-cube from the existing hypercube. It selects two or more
dimensions from the hypercube to generate a new sub-cube for the given data.
Example: Considering our clothing company sales example.
In the figure, we are using the dice operation to retrieve the sales done by the company in the first half
of the year i.e., the sales in the first two quarters.

Pivot (Rotate)
It is used to provide an alternate view of the data available to the users. It is also known as Rotate
operation as it rotates the cube’s orientation to view the data from different perspectives.
Example: Considering our clothing company sales example:

Here, we are using the Pivot operation to view the sub-cube from a different perspective.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


22

3.12. Types of OLAP Servers

ROLAP
ROLAP is relational OLAP where the data is arranged in traditional methods like rows and columns in
the data warehouse. It is visible and accessible to users in multi-dimensional form. To display it as a
multi-dimensional view the data is designed as the related layer of metadata which supports the
collection and storage of data. It does dynamically in handling the complex query. It is slower than
MOLAP where ROLAP deals with the enormous volume of data at a higher speed.
Benefits
• It is compatible with data warehouses and OLTP systems.
• The data size limitation of ROLAP technology is determined by the underlying RDBMS. As a
result, ROLAP does not limit the amount of data that can be stored.
Limitations
• SQL functionality is constrained.
• It’s difficult to keep aggregate tables up to date.

MOLAP
MOLAP is a multi-dimensional OLAP where the data is analyzed on the registered system. The data is
arranged in a multi-dimensional array. The array carries predefined data when the data is loaded in
database management. MOLAP system is implemented on the application layer and when the user
sends any request it fetches the data with the minimum response time.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


23

Benefits
• Suitable for slicing and dicing operations.
• Outperforms ROLAP when data is dense.
• Capable of performing complex calculations.
Limitations
• It is difficult to change the dimensions without re-aggregating.
• Since all calculations are performed when the cube is built, a large amount of data cannot be
stored in the cube itself.

HOLAP
ROLAP and MOLAP are combined in Hybrid On-Line Analytical Processing (HOLAP). It offers
greater scalability than ROLAP and faster computation than MOLAP.
HOLAP servers are capable of storing large amounts of detailed data. HOLAP benefits from ROLAP’s
greater scalability. HOLAP, on the other hand, makes use of cube technology for faster performance
and summary-type information. Because detailed data is stored in a relational database, cubes are
smaller than MOLAP.
Benefits
• HOLAP combines the benefits of MOLAP and ROLAP.
• Provide quick access at all aggregation levels.
Limitations
• HOLAP architecture is complex.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


24

• There is a greater likelihood of overlap, particularly in their functionalities.

Comparison between ROLAP, MOLAP & HOLAP


Parameter ROLAP MOLAP HOLAP
Multi-dimensional
Relational online Hybrid online
Acronym online analytical
analytical processing analytical processing
processing
Data is stored on the
Data is stored on the Data is stored on the
Storage methods registered database
main data warehouse relational databases
MDDB
Data is fetched from
Data is fetched from Data is fetched from the
Fetching methods the relational
the main repository Proprietary database
databases
Data is arranged and
Data is arranged and Data is arranged in
saved in the form of
Data Arrangement stored in the form of multi-dimensional
tables with rows and
data cubes form
columns
Limited data which is
Enormous data is Large data can be
Volume kept in proprietary is
processed processed
processed
It uses both Sparse
It works with Sparse
Technique It works with SQL matrix technology
Matrix technology
and SQL
It has dynamic
Designed view It has dynamic access It has a static access
access
It has Maximum It has Minimum It takes Minimum
Response time
response time response time response time

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


25

OALP and OLTP


Online analytical processing (OLAP) and online transaction processing (OLTP) are data processing
systems that help you store and analyze business data. You can collect and store data from multiple
sources—such as websites, applications, smart meters, and internal systems. OLAP combines and
groups the data so you can analyze it from different points of view. Conversely, OLTP stores and
updates transactional data reliably and efficiently in high volumes. OLTP databases can be one among
several data sources for an OLAP system.

What are the similarities between OLAP and OLTP?


Both online analytical processing (OLAP) and online transaction processing (OLTP) are database
management systems for storing and processing data in large volumes. They require efficient and
reliable IT infrastructure to run smoothly. You can use them both to query existing data or store new
data. Both support data-driven decision-making in an organization.
Most companies use OLTP and OLAP systems together to meet their business intelligence
requirements. However, the approach to and purpose of data management differ significantly between
OLAP and OLTP.

Example of OLAP vs. OLTP


Let's consider a large retail company that operates hundreds of stores across the country. The company
has a massive database that tracks sales, inventory, customer data, and other key metrics.
The company uses OLTP to process transactions in real time, update inventory levels, and manage
customer accounts. Each store is connected to the central database, which updates the inventory levels
in real time as products are sold. The company also uses OLTP to manage customer accounts—for
example, to track loyalty points, manage payment information, and process returns.
In addition, the company uses OLAP to analyze the data collected by OLTP. The company’s business
analysts can use OLAP to generate reports on sales trends, inventory levels, customer demographics,
and other key metrics. They perform complex queries on large volumes of historical data to identify
patterns and trends that can inform business decisions. They identify popular products in a given time
period and use the information to optimize inventory budgets.

When to use OLAP vs. OLTP


Online analytical processing (OLAP) and online transaction processing (OLTP) are two different data
processing systems designed for different purposes. OLAP is optimized for complex data analysis and
reporting, while OLTP is optimized for transactional processing and real-time updates.
Understanding the differences between these systems can help you make informed decisions about
which system meets your needs better. In many cases, a combination of both OLAP and OLTP systems

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI


26

may be the best solution for businesses that require both transaction processing and data analysis.
Ultimately, choosing the right system depends on the specific needs of your business, including data
volume, query complexity, response time, scalability, and cost.

Criteria OLAP OLTP

OLAP helps you analyze large volumes of OLTP helps you manage and process
Purpose
data to support decision-making. real-time transactions.

OLAP uses historical and aggregated data OLTP uses real-time and transactional
Data source
from multiple sources. data from a single source.

OLAP uses multidimensional (cubes) or


Data structure OLTP uses relational databases.
relational databases.

OLAP uses star schema, snowflake schema, OLTP uses normalized or denormalized
Data model
or other analytical models. models.

OLAP has large storage requirements. Think OLTP has comparatively smaller storage
Volume of data
terabytes (TB) and petabytes (PB). requirements. Think gigabytes (GB).

OLAP has longer response times, typically OLTP has shorter response times,
Response time
in seconds or minutes. typically in milliseconds

OLAP is good for analyzing trends, OLTP is good for processing payments,
Example
predicting customer behavior, and customer data management, and order
applications
identifying profitability. processing.

Compiled by: Er Rupesh Shrestha DCOM III/I, NPI

You might also like