[go: up one dir, main page]

0% found this document useful (0 votes)
14 views61 pages

Unit I Data Warehouse and OLAP

A data warehouse is a specialized relational database designed for query and analysis, integrating historical data from various sources to support management decision-making. It is characterized by being subject-oriented, integrated, time-variant, and non-volatile, allowing for complex data analysis without affecting operational systems. The architecture of a data warehouse includes components like data staging, storage, and delivery, and employs an ETL process for data extraction, transformation, and loading.

Uploaded by

karan.232293108
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views61 pages

Unit I Data Warehouse and OLAP

A data warehouse is a specialized relational database designed for query and analysis, integrating historical data from various sources to support management decision-making. It is characterized by being subject-oriented, integrated, time-variant, and non-volatile, allowing for complex data analysis without affecting operational systems. The architecture of a data warehouse includes components like data staging, storage, and delivery, and employs an ETL process for data extraction, transformation, and loading.

Uploaded by

karan.232293108
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

What is a Data Warehouse?

“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile


collection of data in support of management’s decision-making process.”—Bill Inmon.

The Father of Data Warehousing.

According to Sean Kelly:

“A data warehouse is a Separate, Available, Integrated, Time stamped, Subject Oriented,


Nonvolatile, Accessible, collection of data in support of management’s decision-making
process.”

A Data Warehouse (DW) is a relational database that is designed for query and analysis
rather than transaction processing. It includes historical data derived from transaction
data from single and multiple sources.

A Data Warehouse provides integrated, enterprise-wide, historical data and focuses on


providing support for decision-makers for data modeling and analysis.

A Data Warehouse is a group of data specific to the entire organization, not only to a
particular group of users.

It is not used for daily operations and transaction processing but used for making
decisions.

A Data Warehouse can be viewed as a data system with the following attributes:

o It is a database designed for investigative tasks, using data from various


applications.
o It supports a relatively small number of clients with relatively long interactions.
o It includes current and historical data to provide a historical perspective of
information.
o Its usage is read-intensive.
o It contains a few large tables.

"Data Warehouse is a subject-oriented, integrated, and time-variant store of information


in support of management's decisions."
Characteristics of Data Warehouse

Subject-Oriented
A data warehouse target on the modeling and analysis of data for decision-makers.
Therefore, data warehouses typically provide a concise and straightforward view around
a particular subject, such as customer, product, or sales, instead of the global
organization's ongoing operations. This is done by excluding data that are not useful
concerning the subject and including all data needed by the users to understand the
subject.
Integrated
A data warehouse integrates various heterogeneous data sources like RDBMS, flat files,
and online transaction records. It requires performing data cleaning and integration
during data warehousing to ensure consistency in naming conventions, attributes types,
etc., among different data sources.
Time-Variant
Historical information is kept in a data warehouse. For example, one can retrieve files from
3 months, 6 months, 12 months, or even previous data from a data warehouse. These
variations with a transactions system, where often only the most current file is kept.
Non-Volatile
The data warehouse is a physically separate data storage, which is transformed from the
source operational RDBMS. The operational updates of data do not occur in the data
warehouse, i.e., update, insert, and delete operations are not performed. It usually requires
only two procedures in data accessing: Initial loading of data and access to data.
Therefore, the DW does not require transaction processing, recovery, and concurrency
capabilities, which allows for substantial speedup of data retrieval. Non-Volatile defines
that once entered into the warehouse, and data should not change.

History of Data Warehouse


The idea of data warehousing came to the late 1980's when IBM researchers Barry Devlin
and Paul Murphy established the "Business Data Warehouse."

In essence, the data warehousing idea was planned to support an architectural model for
the flow of information from the operational system to decisional support environments.
The concept attempt to address the various problems associated with the flow, mainly the
high costs associated with it.

In the absence of data warehousing architecture, a vast amount of space was required to
support multiple decision support environments. In large corporations, it was ordinary for
various decision support environments to operate independently.

Goals of Data Warehousing


o To help reporting as well as analysis
o Maintain the organization's historical information
o Be the foundation for decision making.

Need for Data Warehouse


Data Warehouse is needed for the following reasons:

1. Business User: Business users require a data warehouse to view summarized data
from the past. Since these people are non-technical, the data may be presented to
them in an elementary form.
2. Store historical data: Data Warehouse is required to store the time variable data
from the past. This input is made to be used for various purposes.
3. Make strategic decisions: Some strategies may be depending upon the data in
the data warehouse. So, data warehouse contributes to making strategic decisions.
4. For data consistency and quality: Bringing the data from different sources at a
commonplace, the user can effectively undertake to bring the uniformity and
consistency in data.
5. High response time: Data warehouse has to be ready for somewhat unexpected
loads and types of queries, which demands a significant degree of flexibility and
quick response time.
Benefits of Data Warehouse
1. Understand business trends and make better forecasting decisions.
2. Data Warehouses are designed to perform well enormous amounts of data.
3. The structure of data warehouses is more accessible for end-users to navigate,
understand, and query.
4. Queries that would be complex in many normalized databases could be easier to
build and maintain in data warehouses.
5. Data warehousing is an efficient method to manage demand for lots of information
from lots of users.
6. Data warehousing provide the capabilities to analyze a large amount of historical
data.

Prerequisites
Before learning about Data Warehouse, you must have the fundamental knowledge of
basic database concepts such as schema, ER model, structured query language, etc.

Components or Building Blocks of Data


Warehouse
Architecture is the proper arrangement of the elements. We build a data warehouse with
software and hardware components. To suit the requirements of our organizations, we
arrange these building we may want to boost up another part with extra tools and
services. All of these depends on our circumstances.
The figure shows the essential elements of a typical warehouse. We see the Source Data
component shows on the left. The Data staging element serves as the next building block.
In the middle, we see the Data Storage component that handles the data warehouses
data. This element not only stores and manages the data; it also keeps track of data using
the metadata repository. The Information Delivery component shows on the right consists
of all the different ways of making the information from the data warehouses available to
the users.

Source Data Component


Source data coming into the data warehouses may be grouped into four broad categories:

Production Data: This type of data comes from the different operating systems of the
enterprise. Based on the data requirements in the data warehouse, we choose segments
of the data from the various operational modes.

Internal Data: In each organization, the client keeps their "private" spreadsheets, reports,
customer profiles, and sometimes even department databases. This is the internal data,
part of which could be useful in a data warehouse.

Archived Data: Operational systems are mainly intended to run the current business. In
every operational system, we periodically take the old data and store it in achieved files.
External Data: Most executives depend on information from external sources for a large
percentage of the information they use. They use statistics associating to their industry
produced by the external department.

Data Staging Component


After we have been extracted data from various operational systems and external sources,
we have to prepare the files for storing in the data warehouse. The extracted data coming
from several different sources need to be changed, converted, and made ready in a format
that is relevant to be saved for querying and analysis.

We will now discuss the three primary functions that take place in the staging area.

1) Data Extraction: This method has to deal with numerous data sources. We have to
employ the appropriate techniques for each data source.

2) Data Transformation: As we know, data for a data warehouse comes from many
different sources. If data extraction for a data warehouse posture big challenges, data
transformation present even significant challenges. We perform several individual tasks
as part of data transformation.

First, we clean the data extracted from each source. Cleaning may be the correction of
misspellings or may deal with providing default values for missing data elements, or
elimination of duplicates when we bring in the same data from various source systems.

Standardization of data components forms a large part of data transformation. Data


transformation contains many forms of combining pieces of data from different sources.
We combine data from single source record or related data parts from many source
records.

On the other hand, data transformation also contains purging source data that is not
useful and separating outsource records into new combinations. Sorting and merging of
data take place on a large scale in the data staging area. When the data transformation
function ends, we have a collection of integrated data that is cleaned, standardized, and
summarized.

3) Data Loading: Two distinct categories of tasks form data loading functions. When we
complete the structure and construction of the data warehouse and go live for the first
time, we do the initial loading of the information into the data warehouse storage. The
initial load moves high volumes of data using up a substantial amount of time.
Data Storage Components
Data storage for the data warehousing is a split repository. The data repositories for the
operational systems generally include only the current data. Also, these data repositories
include the data structured in highly normalized for fast and efficient processing.

Information Delivery Component


The information delivery element is used to enable the process of subscribing for data
warehouse files and having it transferred to one or more destinations according to some
customer-specified scheduling algorithm.

Metadata Component
Metadata in a data warehouse is equal to the data dictionary or the data catalog in a
database management system. In the data dictionary, we keep the data about the logical
data structures, the data about the records and addresses, the information about the
indexes, and so on.

Data Marts
It includes a subset of corporate-wide data that is of value to a specific group of users.
The scope is confined to particular selected subjects. Data in a data warehouse should be
a fairly current, but not mainly up to the minute, although development in the data
warehouse industry has made standard and incremental data dumps more achievable.
Data marts are lower than data warehouses and usually contain organization. The current
trends in data warehousing are to developed a data warehouse with several smaller
related data marts for particular kinds of queries and reports.

Management and Control Component


The management and control elements coordinate the services and functions within the
data warehouse. These components control the data transformation and the data transfer
into the data warehouse storage. On the other hand, it moderates the data delivery to the
clients. Its work with the database management systems and authorizes data to be
correctly saved in the repositories. It monitors the movement of information into the
staging method and from there into the data warehouses storage itself.

Why we need a separate Data Warehouse?


Data Warehouse queries are complex because they involve the computation of large
groups of data at summarized levels.

It may require the use of distinctive data organization, access, and implementation
method based on multidimensional views.

Performing OLAP queries in operational database degrade the performance of functional


tasks.

Data Warehouse is used for analysis and decision making in which extensive database is
required, including historical data, which operational database does not typically
maintain.

The separation of an operational database from data warehouses is based on the different
structures and uses of data in these systems.

Because the two systems provide different functionalities and require different kinds of
data, it is necessary to maintain separate databases.

Data Warehouse Architecture


A data warehouse architecture is a method of defining the overall architecture of data
communication processing and presentation that exist for end-clients computing within
the enterprise. Each data warehouse is different, but all are characterized by standard vital
components.

Production applications such as payroll accounts payable product purchasing and


inventory control are designed for online transaction processing (OLTP). Such
applications gather detailed data from day to day operations.

Data Warehouse applications are designed to support the user ad-hoc data requirements,
an activity recently dubbed online analytical processing (OLAP). These include
applications such as forecasting, profiling, summary reporting, and trend analysis.

Production databases are updated continuously by either by hand or via OLTP


applications. In contrast, a warehouse database is updated from operational systems
periodically, usually during off-hours. As OLTP data accumulates in production databases,
it is regularly extracted, filtered, and then loaded into a dedicated warehouse server that
is accessible to users. As the warehouse is populated, it must be restructured tables de-
normalized, data cleansed of errors and redundancies and new fields and keys added to
reflect the needs to the user for sorting, combining, and summarizing data.

Data warehouses and their architectures very depending upon the elements of an
organization's situation.

Three common architectures are:

o Data Warehouse Architecture: Basic


o Data Warehouse Architecture: With Staging Area
o Data Warehouse Architecture: With Staging Area and Data Marts

Data Warehouse Architecture: Basic


Operational System

An operational system is a method used in data warehousing to refer to a system that


is used to process the day-to-day transactions of an organization.

Flat Files

A Flat file system is a system of files in which transactional data is stored, and every file
in the system must have a different name.

Meta Data

A set of data that defines and gives information about other data.

Meta Data used in Data Warehouse for a variety of purpose, including:

Meta Data summarizes necessary information about data, which can make finding and
work with particular instances of data more accessible. For example, author, data build,
and data changed, and file size are examples of very basic document metadata.

Metadata is used to direct a query to the most appropriate data source.

Lightly and highly summarized data


The area of the data warehouse saves all the predefined lightly and highly summarized
(aggregated) data generated by the warehouse manager.

The goals of the summarized information are to speed up query performance. The
summarized record is updated continuously as new information is loaded into the
warehouse.

End-User access Tools

The principal purpose of a data warehouse is to provide information to the business


managers for strategic decision-making. These customers interact with the warehouse
using end-client access tools.

The examples of some of the end-user access tools can be:

o Reporting and Query Tools


o Application Development Tools
o Executive Information Systems Tools
o Online Analytical Processing Tools
o Data Mining Tools

Data Warehouse Architecture: With Staging Area


We must clean and process your operational information before put it into the warehouse.

e can do this programmatically, although data warehouses uses a staging area (A place
where data is processed before entering the warehouse).

A staging area simplifies data cleansing and consolidation for operational method coming
from multiple source systems, especially for enterprise data warehouses where all relevant
data of an enterprise is consolidated.
Data Warehouse Staging Area is a temporary location where a record from source
systems is copied.
Data Warehouse Architecture: With Staging Area and Data Marts
We may want to customize our warehouse's architecture for multiple groups within our
organization.

We can do this by adding data marts. A data mart is a segment of a data warehouses
that can provided information for reporting and analysis on a section, unit, department
or operation in the company, e.g., sales, payroll, production, etc.

The figure illustrates an example where purchasing, sales, and stocks are separated. In this
example, a financial analyst wants to analyze historical data for purchases and sales or
mine historical information to make predictions about customer behavior.

ETL (Extract, Transform, and Load) Process


What is ETL?
The mechanism of extracting information from source systems and bringing it into the
data warehouse is commonly called ETL, which stands for Extraction, Transformation
and Loading.
The ETL process requires active inputs from various stakeholders, including developers,
analysts, testers, top executives and is technically challenging.

To maintain its value as a tool for decision-makers, Data warehouse technique needs to
change with business changes. ETL is a recurring method (daily, weekly, monthly) of a
Data warehouse system and needs to be agile, automated, and well documented.

How ETL Works?


ETL consists of three separate phases:

Extraction
o Extraction is the operation of extracting information from a source system for further use
in a data warehouse environment. This is the first stage of the ETL process.
o Extraction process is often one of the most time-consuming tasks in the ETL.
o The source systems might be complicated and poorly documented, and thus determining
which data needs to be extracted can be difficult.
o The data has to be extracted several times in a periodic manner to supply all changed data
to the warehouse and keep it up-to-date.

Cleansing
The cleansing stage is crucial in a data warehouse technique because it is supposed to
improve data quality. The primary data cleansing features found in ETL tools are
rectification and homogenization. They use specific dictionaries to rectify typing mistakes
and to recognize synonyms, as well as rule-based cleansing to enforce domain-specific
rules and defines appropriate associations between values.

The following examples show the essential of data cleaning:

If an enterprise wishes to contact its users or its suppliers, a complete, accurate and up-
to-date list of contact addresses, email addresses and telephone numbers must be
available.

If a client or supplier calls, the staff responding should be quickly able to find the person
in the enterprise database, but this need that the caller's name or his/her company name
is listed in the database.

If a user appears in the databases with two or more slightly different names or different
account numbers, it becomes difficult to update the customer's information.

Transformation
Transformation is the core of the reconciliation phase. It converts records from its
operational source format into a particular data warehouse format. If we implement a
three-layer architecture, this phase outputs our reconciled data layer.

The following points must be rectified in this phase:

o Loose texts may hide valuable information. For example, XYZ PVT Ltd does not explicitly
show that this is a Limited Partnership company.
o Different formats can be used for individual data. For example, data can be saved as a
string or as three integers.
Following are the main transformation processes aimed at populating the reconciled data
layer:

o Conversion and normalization that operate on both storage formats and units of measure
to make data uniform.
o Matching that associates equivalent fields in different sources.
o Selection that reduces the number of source fields and records.

Cleansing and Transformation processes are often closely linked in ETL tools.

Loading
The Load is the process of writing the data into the target database. During the load step,
it is necessary to ensure that the load is performed correctly and with as little resources
as possible.
Loading can be carried in two ways:

1. Refresh: Data Warehouse data is completely rewritten. This means that older file is
replaced. Refresh is usually used in combination with static extraction to populate a data
warehouse initially.
2. Update: Only those changes applied to source information are added to the Data
Warehouse. An update is typically carried out without deleting or modifying preexisting
data. This method is used in combination with incremental extraction to update data
warehouses regularly.

Properties of Data Warehouse Architectures


The following architecture properties are necessary for a data warehouse system:

1. Separation: Analytical and transactional processing should be keep apart as much as


possible.
2. Scalability: Hardware and software architectures should be simple to upgrade the data
volume, which has to be managed and processed, and the number of user's requirements,
which have to be met, progressively increase.

3. Extensibility: The architecture should be able to perform new operations and


technologies without redesigning the whole system.

4. Security: Monitoring accesses are necessary because of the strategic data stored in the
data warehouses.

5. Administer ability: Data Warehouse management should not be complicated.

Data Warehouse Design


A data warehouse is a single data repository where a record from multiple data sources
is integrated for online business analytical processing (OLAP). This implies a data
warehouse needs to meet the requirements from all the business stages within the entire
organization. Thus, data warehouse design is a hugely complex, lengthy, and hence error-
prone process. Furthermore, business analytical functions change over time, which results
in changes in the requirements for the systems. Therefore, data warehouse and OLAP
systems are dynamic, and the design process is continuous.

Data warehouse design takes a method different from view materialization in the
industries. It sees data warehouses as database systems with particular needs such as
answering management related queries. The target of the design becomes how the record
from multiple data sources should be extracted, transformed, and loaded (ETL) to be
organized in a database as the data warehouse.

There are two approaches

1. "top-down" approach
2. "bottom-up" approach

Top-down Design Approach


In the "Top-Down" design approach, a data warehouse is described as a subject-oriented,
time-variant, non-volatile and integrated data repository for the entire enterprise data
from different sources are validated, reformatted and saved in a normalized (up to 3NF)
database as the data warehouse. The data warehouse stores "atomic" information, the
data at the lowest level of granularity, from where dimensional data marts can be built by
selecting the data required for specific business subjects or particular departments. An
approach is a data-driven approach as the information is gathered and integrated first
and then business requirements by subjects for building data marts are formulated. The
advantage of this method is which it supports a single integrated data source. Thus data
marts built from it will have consistency when they overlap.

Advantages of top-down design

Data Marts are loaded from the data warehouses.

Developing new data mart from the data warehouse is very easy.

Disadvantages of top-down design

This technique is inflexible to changing departmental needs.

The cost of implementing the project is high.


Bottom-Up Design Approach
In the "Bottom-Up" approach, a data warehouse is described as "a copy of transaction
data specifical architecture for query and analysis," term the star schema. In this approach,
a data mart is created first to necessary reporting and analytical capabilities for particular
business processes (or subjects). Thus it is needed to be a business-driven approach in
contrast to Inmon's data-driven approach.

Data marts include the lowest grain data and, if needed, aggregated data too. Instead of
a normalized database for the data warehouse, a denormalized dimensional database is
adapted to meet the data delivery requirements of data warehouses. Using this method,
to use the set of data marts as the enterprise data warehouse, data marts should be built
with conformed dimensions in mind, defining that ordinary objects are represented the
same in different data marts. The conformed dimensions connected the data marts to
form a data warehouse, which is generally called a virtual data warehouse.
The advantage of the "bottom-up" design approach is that it has quick ROI, as developing
a data mart, a data warehouse for a single subject, takes far less time and effort than
developing an enterprise-wide data warehouse. Also, the risk of failure is even less. This
method is inherently incremental. This method allows the project team to learn and grow.

Advantages of bottom-up design

Documents can be generated quickly.

The data warehouse can be extended to accommodate new business units.

It is just developing new data marts and then integrating with other data marts.

Disadvantages of bottom-up design


the locations of the data warehouse and the data marts are reversed in the bottom-up
approach design.

Differentiate between Top-Down Design Approach and


Bottom-Up Design Approach

Top-Down Design Approach Bottom-Up Design Approach

Breaks the vast problem into smaller Solves the essential low-level problem and
subproblems. integrates them into a higher one.

Inherently architected- not a union of Inherently incremental; can schedule essential data
several data marts. marts first.

Single, central storage of information Departmental information stored.


about the content.

Centralized rules and control. Departmental rules and control.

It includes redundant information. Redundancy can be removed.

It may see quick results if implemented Less risk of failure, favorable return on investment,
with repetitions. and proof of techniques.

What is Meta Data?


Metadata is data about the data or documentation about the information which is
required by the users. In data warehousing, metadata is one of the essential aspects.

Metadata includes the following:

1. The location and descriptions of warehouse systems and components.


2. Names, definitions, structures, and content of data-warehouse and end-users
views.
3. Identification of authoritative data sources.
4. Integration and transformation rules used to populate data.
5. Integration and transformation rules used to deliver information to end-user
analytical tools.
6. Subscription information for information delivery to analysis subscribers.
7. Metrics used to analyze warehouses usage and performance.
8. Security authorizations, access control list, etc.

Metadata is used for building, maintaining, managing, and using the data warehouses.
Metadata allow users access to help understand the content and find data.

What is Data Mart?


A Data Mart is a subset of a directorial information store, generally oriented to a specific
purpose or primary data subject which may be distributed to provide business needs.
Data Marts are analytical record stores designed to focus on particular business functions
for a specific community within an organization. Data marts are derived from subsets of
data in a data warehouse, though in the bottom-up data warehouse design methodology,
the data warehouse is created from the union of organizational data marts.

The fundamental use of a data mart is Business Intelligence (BI) applications. BI is used
to gather, store, access, and analyze record. It can be used by smaller businesses to utilize
the data they have accumulated since it is less expensive than implementing a data
warehouse.
Reasons for creating a data mart
o Creates collective data by a group of users
o Easy access to frequently needed data
o Ease of creation
o Improves end-user response time
o Lower cost than implementing a complete data warehouses
o Potential clients are more clearly defined than in a comprehensive data warehouse
o It contains only essential business data and is less cluttered.

Types of Data Marts


There are mainly two approaches to designing data marts. These approaches are

o Dependent Data Marts


o Independent Data Marts

Dependent Data Marts


A dependent data mart is a logical subset of a physical subset of a higher data warehouse.
According to this technique, the data marts are treated as the subsets of a data
warehouse. In this technique, firstly a data warehouse is created from which further
various data marts can be created. These data marts are dependent on the data
warehouse and extract the essential record from it. In this technique, as the data
warehouse creates the data mart; therefore, there is no need for data mart integration. It
is also known as a top-down approach.
Independent Data Marts
The second approach is Independent data marts (IDM) Here, firstly independent data
marts are created, and then a data warehouse is designed using these independent
multiple data marts. In this approach, as all the data marts are designed independently;
therefore, the integration of data marts is required. It is also termed as a bottom-up
approach as the data marts are integrated to develop a data warehouse.

Other than these two categories, one more type exists that is called "Hybrid Data Marts."
Hybrid Data Marts
It allows us to combine input from sources other than a data warehouse. This could be
helpful for many situations; especially when Adhoc integrations are needed, such as after
a new group or product is added to the organizations.

Steps in Implementing a Data Mart


The significant steps in implementing a data mart are to design the schema, construct the
physical storage, populate the data mart with data from source systems, access it to make
informed decisions and manage it over time. So, the steps are:

Designing
The design step is the first in the data mart process. This phase covers all of the functions
from initiating the request for a data mart through gathering data about the requirements
and developing the logical and physical design of the data mart.

It involves the following tasks:

1. Gathering the business and technical requirements


2. Identifying data sources
3. Selecting the appropriate subset of data
4. Designing the logical and physical architecture of the data mart.

Constructing
This step contains creating the physical database and logical structures associated with
the data mart to provide fast and efficient access to the data.

It involves the following tasks:

1. Creating the physical database and logical structures such as tablespaces associated with
the data mart.
2. creating the schema objects such as tables and indexes describe in the design step.
3. Determining how best to set up the tables and access structures.
Populating
This step includes all of the tasks related to the getting data from the source, cleaning it
up, modifying it to the right format and level of detail, and moving it into the data mart.

It involves the following tasks:

1. Mapping data sources to target data sources


2. Extracting data
3. Cleansing and transforming the information.
4. Loading data into the data mart
5. Creating and storing metadata

Accessing
This step involves putting the data to use: querying the data, analyzing it, creating reports,
charts and graphs and publishing them.

It involves the following tasks:

1. Set up and intermediate layer (Meta Layer) for the front-end tool to use. This layer
translates database operations and objects names into business conditions so that the
end-clients can interact with the data mart using words which relates to the business
functions.
2. Set up and manage database architectures like summarized tables which help queries
agree through the front-end tools execute rapidly and efficiently.

Managing
This step contains managing the data mart over its lifetime. In this step, management
functions are performed as:

Providing secure access to the data.

1. Managing the growth of the data.


2. Optimizing the system for better performance.
3. Ensuring the availability of data event with system failures.
Difference between Data Warehouse and Data Mart

Data Warehouse Data Mart

A Data Warehouse is a vast repository of A data mart is an only subtype of a Data


information collected from various organizations Warehouses. It is architecture to meet the
or departments within a corporation. requirement of a specific user group.

It may hold multiple subject areas. It holds only one subject area. For example,
Finance or Sales.

It holds very detailed information. It may hold more summarized data.

Works to integrate all data sources It concentrates on integrating data from a


given subject area or set of source systems.

In data warehousing, Fact constellation is used. In Data Mart, Star Schema and Snowflake
Schema are used.
It is a Centralized System. It is a Decentralized
System.

Data Warehousing is the data-oriented. Data Marts is a project-oriented.

Feature OLTP OLAP

Characteristic It is a system which is used to It is a system which is used to manage


manage operational Data. informational Data.

Users Clerks, clients, and information Knowledge workers, including managers,


technology professionals. executives, and analysts.

System OLTP system is a customer- OLAP system is market-oriented,


orientation oriented, transaction, and knowledge workers including managers,
query processing are done by do data analysts executive and analysts.
clerks, clients, and information
technology professionals.

Data contents OLTP system manages current OLAP system manages a large amount of
data that too detailed and are historical data, provides facilitates for
used for decision making. summarization and aggregation, and
stores and manages data at different levels
of granularity. This information makes the
data more comfortable to use in informed
decision making.

Database Size 100 MB-GB 100 GB-TB

Database OLTP system usually uses an OLAP system typically uses either a star or
design entity-relationship (ER) data snowflake model and subject-oriented
model and application- database design.
oriented database design.
View OLTP system focuses primarily OLAP system often spans multiple versions
on the current data within an of a database schema, due to the
enterprise or department, evolutionary process of an organization.
without referring to historical OLAP systems also deal with data that
information or data in different originates from various organizations,
organizations. integrating information from many data
stores.

Volume of data Not very large Because of their large volume, OLAP data
are stored on multiple storage media.

Access patterns The access patterns of an OLTP Accesses to OLAP systems are mostly read-
system subsist mainly of short, only methods because of these data
atomic transactions. Such a warehouses stores historical data.
system requires concurrency
control and recovery
techniques.

Access mode Read/write Mostly write

Insert and Short and fast inserts and Periodic long-running batch jobs refresh
Updates updates proposed by end- the data.
users.

Number of Tens Millions


records
accessed

Normalization Fully Normalized Partially Normalized

Processing Very Fast It depends on the amount of files


Speed contained, batch data refresh, and complex
query may take many hours, and query
speed can be upgraded by creating
indexes.
What is Dimensional Modeling?
Dimensional modeling represents data with a cube operation, making more suitable
logical data representation with OLAP data management. The perception of Dimensional
Modeling was developed by Ralph Kimball and is consist
of "fact" and "dimension" tables.

In dimensional modeling, the transaction record is divided into either "facts," which are
frequently numerical transaction data, or "dimensions," which are the reference
information that gives context to the facts. For example, a sale transaction can be damage
into facts such as the number of products ordered and the price paid for the products,
and into dimensions such as order date, user name, product number, order ship-to, and
bill-to locations, and salesman responsible for receiving the order.

Objectives of Dimensional Modeling


The purposes of dimensional modeling are:

1. To produce database architecture that is easy for end-clients to understand and


write queries.
2. To maximize the efficiency of queries. It achieves these goals by minimizing the
number of tables and relationships between them.

Advantages of Dimensional Modeling


Following are the benefits of dimensional modeling are:

Dimensional modeling is simple: Dimensional modeling methods make it possible for


warehouse designers to create database schemas that business customers can easily hold
and comprehend. There is no need for vast training on how to read diagrams, and there
is no complicated relationship between different data elements.

Dimensional modeling promotes data quality: The star schema enable warehouse
administrators to enforce referential integrity checks on the data warehouse. Since the
fact information key is a concatenation of the essentials of its associated dimensions, a
factual record is actively loaded if the corresponding dimensions records are duly
described and also exist in the database.
By enforcing foreign key constraints as a form of referential integrity check, data
warehouse DBAs add a line of defense against corrupted warehouses data.

Performance optimization is possible through aggregates: As the size of the data


warehouse increases, performance optimization develops into a pressing concern.
Customers who have to wait for hours to get a response to a query will quickly become
discouraged with the warehouses. Aggregates are one of the easiest methods by which
query performance can be optimized.

Disadvantages of Dimensional Modeling


1. To maintain the integrity of fact and dimensions, loading the data warehouses with a
record from various operational systems is complicated.
2. It is severe to modify the data warehouse operation if the organization adopting the
dimensional technique changes the method in which it does business.

Elements of Dimensional Modeling


Fact
It is a collection of associated data items, consisting of measures and context data. It
typically represents business items or business transactions.

Dimensions
It is a collection of data which describe one business dimension. Dimensions decide the
contextual background for the facts, and they are the framework over which OLAP is
performed.

Measure
It is a numeric attribute of a fact, representing the performance or behavior of the business
relative to the dimensions.

Considering the relational context, there are two basic models which are used in
dimensional modeling:

o Star Model
o Snowflake Model
The star model is the underlying structure for a dimensional model. It has one broad central table
(fact table) and a set of smaller tables (dimensions) arranged in a radial design around the
primary table. The snowflake model is the conclusion of decomposing one or more of the
dimensions.

Fact Table
Fact tables are used to data facts or measures in the business. Facts are the numeric data elements
that are of interest to the company.

Characteristics of the Fact table

The fact table includes numerical values of what we measure. For example, a fact value of 20
might means that 20 widgets have been sold.

Each fact table includes the keys to associated dimension tables. These are known as
foreign keys in the fact table.

Fact tables typically include a small number of columns.

When it is compared to dimension tables, fact tables have a large number of rows.

Dimension Table
Dimension tables establish the context of the facts. Dimensional tables store fields that
describe the facts.

Characteristics of the Dimension table

Dimension tables contain the details about the facts. That, as an example, enables the
business analysts to understand the data and their reports better.

The dimension tables include descriptive data about the numerical values in the fact table.
That is, they contain the attributes of the facts. For example, the dimension tables for a
marketing analysis function might include attributes such as time, marketing region, and
product type.

Since the record in a dimension table is denormalized, it usually has a large number of
columns. The dimension tables include significantly fewer rows of information than the
fact table.

The attributes in a dimension table are used as row and column headings in a document
or query results display.
Example: A city and state can view a store summary in a fact table. Item summary can be
viewed by brand, color, etc. Customer information can be viewed by name and address.

Fact Table
Time ID Product ID Customer ID Unit Sold

4 17 2 1

8 21 3 2

8 4 1 1

In this example, Customer ID column in the facts table is the foreign keys that join with
the dimension table. By following the links, we can see that row 2 of the fact table records
the fact that customer 3, Gaurav, bought two items on day 8.

Dimension Tables
Customer ID Name Gender Income Education Region
1 Rohan Male 2 3 4

2 Sandeep Male 3 5 1

3 Gaurav Male 1 7 3

Hierarchy
A hierarchy is a directed tree whose nodes are dimensional attributes and whose arcs
model many to one association between dimensional attributes team. It contains a
dimension, positioned at the tree's root, and all of the dimensional attributes that define
it.

What is Multi-Dimensional Data Model?


A multidimensional model views data in the form of a data-cube. A data cube enables
data to be modeled and viewed in multiple dimensions. It is defined by dimensions and
facts.

The dimensions are the perspectives or entities concerning which an organization keeps
records. For example, a shop may create a sales data warehouse to keep records of the
store's sales for the dimension time, item, and location. These dimensions allow the save
to keep track of things, for example, monthly sales of items and the locations at which the
items were sold. Each dimension has a table related to it, called a dimensional table, which
describes the dimension further. For example, a dimensional table for an item may contain
the attributes item_name, brand, and type.

A multidimensional data model is organized around a central theme, for example, sales.
This theme is represented by a fact table. Facts are numerical measures. The fact table
contains the names of the facts or measures of the related dimensional tables.
Consider the data of a shop for items sold per quarter in the city of Delhi. The data is
shown in the table. In this 2D representation, the sales for Delhi are shown for the time
dimension (organized in quarters) and the item dimension (classified according to the
types of an item sold). The fact or measure displayed in rupee_sold (in thousands).

Now, if we want to view the sales data with a third dimension, For example, suppose the
data according to time and item, as well as the location is considered for the cities
Chennai, Kolkata, Mumbai, and Delhi. These 3D data are shown in the table. The 3D data
of the table are represented as a series of 2D tables.
Conceptually, it may also be represented by the same data in the form of a 3D data cube,
as shown in fig:

What is Data Cube?


When data is grouped or combined in multidimensional matrices called Data Cubes. The
data cube method has a few alternative names or a few variants, such as
"Multidimensional databases," "materialized views," and "OLAP (On-Line Analytical
Processing)."

The general idea of this approach is to materialize certain expensive computations that
are frequently inquired.

For example, a relation with the schema sales (part, supplier, customer, and sale-price)
can be materialized into a set of eight views as shown in fig, where psc indicates a view
consisting of aggregate function value (such as total-sales) computed by grouping three
attributes part, supplier, and customer, p indicates a view composed of the corresponding
aggregate function values calculated by grouping part alone, etc.

A data cube is created from a subset of attributes in the database. Specific attributes are
chosen to be measure attributes, i.e., the attributes whose values are of interest. Another
attributes are selected as dimensions or functional attributes. The measure attributes are
aggregated according to the dimensions.

For example, XYZ may create a sales data warehouse to keep records of the store's sales
for the dimensions time, item, branch, and location. These dimensions enable the store to
keep track of things like monthly sales of items, and the branches and locations at which
the items were sold. Each dimension may have a table identify with it, known as a
dimensional table, which describes the dimensions. For example, a dimension table for
items may contain the attributes item_name, brand, and type.

Data cube method is an interesting technique with many applications. Data cubes could
be sparse in many cases because not every cell in each dimension may have
corresponding data in the database.
Techniques should be developed to handle sparse cubes efficiently.

If a query contains constants at even lower levels than those provided in a data cube, it is
not clear how to make the best use of the precomputed results stored in the data cube.

The model view data in the form of a data cube. OLAP tools are based on the
multidimensional data model. Data cubes usually model n-dimensional data.

A data cube enables data to be modeled and viewed in multiple dimensions. A


multidimensional data model is organized around a central theme, like sales and
transactions. A fact table represents this theme. Facts are numerical measures. Thus, the
fact table contains measure (such as Rs_sold) and keys to each of the related dimensional
tables.

Dimensions are a fact that defines a data cube. Facts are generally quantities, which are
used for analyzing the relationship between dimensions.

Example: In the 2-D representation, we will look at the All Electronics sales data
for items sold per quarter in the city of Vancouver. The measured display in dollars sold
(in thousands).
3-Dimensional Cuboids
Let suppose we would like to view the sales data with a third dimension. For example,
suppose we would like to view the data according to time, item as well as the location for
the cities Chicago, New York, Toronto, and Vancouver. The measured display in dollars
sold (in thousands). These 3-D data are shown in the table. The 3-D data of the table are
represented as a series of 2-D tables.

Conceptually, we may represent the same data in the form of 3-D data cubes, as shown
in fig:
Let us suppose that we would like to view our sales data with an additional fourth
dimension, such as a supplier.

In data warehousing, the data cubes are n-dimensional. The cuboid which holds the
lowest level of summarization is called a base cuboid.

For example, the 4-D cuboid in the figure is the base cuboid for the given time, item,
location, and supplier dimensions.
Figure is shown a 4-D data cube representation of sales data, according to the
dimensions time, item, location, and supplier. The measure displayed is dollars sold (in
thousands).

The topmost 0-D cuboid, which holds the highest level of summarization, is known as the
apex cuboid. In this example, this is the total sales, or dollars sold, summarized over all
four dimensions.

The lattice of cuboid forms a data cube. The figure shows the lattice of cuboids creating
4-D data cubes for the dimension time, item, location, and supplier. Each cuboid
represents a different degree of summarization.
What is Star Schema?
A star schema is the elementary form of a dimensional model, in which data are organized
into facts and dimensions. A fact is an event that is counted or measured, such as a sale
or log in. A dimension includes reference data about the fact, such as date, item, or
customer.

A star schema is a relational schema where a relational schema whose design represents
a multidimensional data model. The star schema is the explicit data warehouse schema. It
is known as star schema because the entity-relationship diagram of this schemas
simulates a star, with points, diverge from a central table. The center of the schema
consists of a large fact table, and the points of the star are the dimension tables.
Fact Tables
A table in a star schema which contains facts and connected to dimensions. A fact table
has two types of columns: those that include fact and those that are foreign keys to the
dimension table. The primary key of the fact tables is generally a composite key that is
made up of all of its foreign keys.

A fact table might involve either detail level fact or fact that have been aggregated (fact
tables that include aggregated fact are often instead called summary tables). A fact table
generally contains facts with the same level of aggregation.

Dimension Tables
A dimension is an architecture usually composed of one or more hierarchies that
categorize data. If a dimension has not got hierarchies and levels, it is called a flat
dimension or list. The primary keys of each of the dimensions table are part of the
composite primary keys of the fact table. Dimensional attributes help to define the
dimensional value. They are generally descriptive, textual values. Dimensional tables are
usually small in size than fact table.

Fact tables store data about sales while dimension tables data about the geographic
region (markets, cities), clients, products, times, channels.

Characteristics of Star Schema


The star schema is intensely suitable for data warehouse database design because of the
following features:

o It creates a DE-normalized database that can quickly provide query responses.


o It provides a flexible design that can be changed easily or added to throughout the
development cycle, and as the database grows.
o It provides a parallel in design to how end-users typically think of and use the data.
o It reduces the complexity of metadata for both developers and end-users.

Advantages of Star Schema


Star Schemas are easy for end-users and application to understand and navigate. With a
well-designed schema, the customer can instantly analyze large, multidimensional data
sets.

The main advantage of star schemas in a decision-support environment are:

Query Performance
A star schema database has a limited number of table and clear join paths, the query run
faster than they do against OLTP systems. Small single-table queries, frequently of a
dimension table, are almost instantaneous. Large join queries that contain multiple tables
takes only seconds or minutes to run.

In a star schema database design, the dimension is connected only through the central
fact table. When the two-dimension table is used in a query, only one join path,
intersecting the fact tables, exist between those two tables. This design feature enforces
authentic and consistent query results.
Load performance and administration
Structural simplicity also decreases the time required to load large batches of record into
a star schema database. By describing facts and dimensions and separating them into the
various table, the impact of a load structure is reduced. Dimension table can be populated
once and occasionally refreshed. We can add new facts regularly and selectively by
appending records to a fact table.

Built-in referential integrity


A star schema has referential integrity built-in when information is loaded. Referential
integrity is enforced because each data in dimensional tables has a unique primary key,
and all keys in the fact table are legitimate foreign keys drawn from the dimension table.
A record in the fact table which is not related correctly to a dimension cannot be given
the correct key value to be retrieved.

Easily Understood
A star schema is simple to understand and navigate, with dimensions joined only through
the fact table. These joins are more significant to the end-user because they represent the
fundamental relationship between parts of the underlying business. Customer can also
browse dimension table attributes before constructing a query.

Disadvantage of Star Schema


There is some condition which cannot be meet by star schemas like the relationship
between the user, and bank account cannot describe as star schema as the relationship
between them is many to many.

Example: Suppose a star schema is composed of a fact table, SALES, and several
dimension tables connected to it for time, branch, item, and geographic locations.

The TIME table has a column for each day, month, quarter, and year. The ITEM table has
columns for each item_Key, item_name, brand, type, supplier_type. The BRANCH table has
columns for each branch_key, branch_name, branch_type. The LOCATION table has
columns of geographic data, including street, city, state, and country.
In this scenario, the SALES table contains only four columns with IDs from the dimension
tables, TIME, ITEM, BRANCH, and LOCATION, instead of four columns for time data, four
columns for ITEM data, three columns for BRANCH data, and four columns for LOCATION
data. Thus, the size of the fact table is significantly reduced. When we need to change an
item, we need only make a single change in the dimension table, instead of making many
changes in the fact table.

We can create even more complex star schemas by normalizing a dimension table into
several tables. The normalized dimension table is called a Snowflake.

What is Snowflake Schema?


A snowflake schema is equivalent to the star schema. "A schema is known as a snowflake
if one or more dimension tables do not connect directly to the fact table but must join
through other dimension tables."

The snowflake schema is an expansion of the star schema where each point of the star
explodes into more points. It is called snowflake schema because the diagram of
snowflake schema resembles a snowflake. Snowflaking is a method of normalizing the
dimension tables in a STAR schemas. When we normalize all the dimension tables entirely,
the resultant structure resembles a snowflake with the fact table in the middle.
Snowflaking is used to develop the performance of specific queries. The schema is
diagramed with each fact surrounded by its associated dimensions, and those dimensions
are related to other dimensions, branching out into a snowflake pattern.

The snowflake schema consists of one fact table which is linked to many dimension tables,
which can be linked to other dimension tables through a many-to-one relationship. Tables
in a snowflake schema are generally normalized to the third normal form. Each dimension
table performs exactly one level in a hierarchy.

The following diagram shows a snowflake schema with two dimensions, each having three
levels. A snowflake schemas can have any number of dimension, and each dimension can
have any number of levels.

Example: Figure shows a snowflake schema with a Sales fact table, with Store, Location,
Time, Product, Line, and Family dimension tables. The Market dimension has two
dimension tables with Store as the primary dimension table, and Location as the outrigger
dimension table. The product dimension has three dimension tables with Product as the
primary dimension table, and the Line and Family table are the outrigger dimension
tables.
A star schema store all attributes for a dimension into one denormalized table. This
needed more disk space than a more normalized snowflake schema. Snowflaking
normalizes the dimension by moving attributes with low cardinality into separate
dimension tables that relate to the core dimension table by using foreign keys.
Snowflaking for the sole purpose of minimizing disk space is not recommended, because
it can adversely impact query performance.

In snowflake, schema tables are normalized to delete redundancy. In snowflake dimension


tables are damaged into multiple dimension tables.
Figure shows a simple STAR schema for sales in a manufacturing company. The sales fact
table include quantity, price, and other relevant metrics. SALESREP, CUSTOMER,
PRODUCT, and TIME are the dimension tables.

The STAR schema for sales, as shown above, contains only five tables, whereas the
normalized version now extends to eleven tables. We will notice that in the snowflake
schema, the attributes with low cardinality in each original dimension tables are removed
to form separate tables. These new tables are connected back to the original dimension
table through artificial keys.
A snowflake schema is designed for flexible querying across more complex dimensions
and relationship. It is suitable for many to many and one to many relationships between
dimension levels.

Advantage of Snowflake Schema


1. The primary advantage of the snowflake schema is the development in query
performance due to minimized disk storage requirements and joining smaller
lookup tables.
2. It provides greater scalability in the interrelationship between dimension levels and
components.
3. No redundancy, so it is easier to maintain.

Disadvantage of Snowflake Schema


1. The primary disadvantage of the snowflake schema is the additional maintenance
efforts required due to the increasing number of lookup tables. It is also known as
a multi fact star schema.
2. There are more complex queries and hence, difficult to understand.
3. More tables more join so more query execution time.

Difference between Star and Snowflake


Schemas
Star Schema
o In a star schema, the fact table will be at the center and is connected to the
dimension tables.
o The tables are completely in a denormalized structure.
o SQL queries performance is good as there is less number of joins involved.
o Data redundancy is high and occupies more disk space.

Snowflake Schema
o A snowflake schema is an extension of star schema where the dimension tables are
connected to one or more dimensions.
o The tables are partially denormalized in structure.
o The performance of SQL queries is a bit less when compared to star schema as
more number of joins are involved.
o Data redundancy is low and occupies less disk space when compared to star
schema.

What is OLAP (Online Analytical Processing)?


OLAP stands for On-Line Analytical Processing. OLAP is a classification of software
technology which authorizes analysts, managers, and executives to gain insight into
information through fast, consistent, interactive access in a wide variety of possible views
of data that has been transformed from raw information to reflect the real dimensionality
of the enterprise as understood by the clients.

OLAP implement the multidimensional analysis of business information and support the
capability for complex estimations, trend analysis, and sophisticated data modeling. It is
rapidly enhancing the essential foundation for Intelligent Solutions containing Business
Performance Management, Planning, Budgeting, Forecasting, Financial Documenting,
Analysis, Simulation-Models, Knowledge Discovery, and Data Warehouses Reporting.
OLAP enables end-clients to perform ad hoc analysis of record in multiple dimensions,
providing the insight and understanding they require for better decision making.

Who uses OLAP and Why?


OLAP applications are used by a variety of the functions of an organization.

Finance and accounting:

o Budgeting
o Activity-based costing
o Financial performance analysis
o And financial modeling

Sales and Marketing

o Sales analysis and forecasting


o Market research analysis
o Promotion analysis
o Customer analysis
o Market and customer segmentation

Production

o Production planning
o Defect analysis

OLAP cubes have two main purposes. The first is to provide business users with a data
model more intuitive to them than a tabular model. This model is called a Dimensional
Model.
The second purpose is to enable fast query response that is usually difficult to achieve
using tabular models.

Difference between OLTP and OLAP


OLTP (On-Line Transaction Processing) is featured by a large number of short on-line
transactions (INSERT, UPDATE, and DELETE). The primary significance of OLTP operations
is put on very rapid query processing, maintaining record integrity in multi-access
environments, and effectiveness consistent by the number of transactions per second. In
the OLTP database, there is an accurate and current record, and schema used to save
transactional database is the entity model (usually 3NF).

OLAP (On-line Analytical Processing) is represented by a relatively low volume of


transactions. Queries are very difficult and involve aggregations. For OLAP operations,
response time is an effectiveness measure. OLAP applications are generally used by Data
Mining techniques. In OLAP database there is aggregated, historical information, stored
in multi-dimensional schemas (generally star schema).

Following are the difference between OLAP and OLTP system.

1) Users: OLTP systems are designed for office worker while the OLAP systems are
designed for decision-makers. Therefore while an OLTP method may be accessed by
hundreds or even thousands of clients in a huge enterprise, an OLAP system is suitable to
be accessed only by a select class of manager and may be used only by dozens of users.

2) Functions: OLTP systems are mission-critical. They provide day-to-day operations of


an enterprise and are largely performance and availability driven. These operations carry
out simple repetitive operations. OLAP systems are management-critical to support the
decision of enterprise support tasks using detailed investigation.

3) Nature: Although SQL queries return a set of data, OLTP methods are designed to step
one record at the time, for example, a data related to the user who may be on the phone
or in the store. OLAP system is not designed to deal with individual customer records.
Instead, they include queries that deal with many data at a time and provide summary or
aggregate information to a manager. OLAP applications include data stored in a data
warehouses that have been extracted from many tables and possibly from more than one
enterprise database.
4) Design: OLTP database operations are designed to be application-oriented
while OLAP operations are designed to be subject-oriented. OLTP systems view the
enterprise record as a collection of tables (possibly based on an entity-relationship
model). OLAP operations view enterprise information as multidimensional).

5) Data: OLTP systems usually deal only with the current status of data. For example, a
record about an employee who left three years ago may not be feasible on the Human
Resources System. The old data may have been achieved on some type of stable storage
media and may not be accessible online. On the other hand, OLAP systems needed
historical data over several years since trends are often essential in decision making.

6) Kind of use: OLTP methods are used for reading and writing operations while OLAP
methods usually do not update the data.

7) View: An OLTP system focuses primarily on the current data within an enterprise or
department, which does not refer to historical data or data in various organizations. In
contrast, an OLAP system spans multiple version of a database schema, due to the
evolutionary process of an organization. OLAP system also deals with information that
originates from different organizations, integrating information from many data stores.
Because of their huge volume, these are stored on multiple storage media.

Difference between ROLAP, MOLAP, and


HOLAP
ROLAP MOLAP HOLAP

ROLAP stands for Relational MOLAP stands for HOLAP stands for Hybrid Online
Online Analytical Processing. Multidimensional Online Analytical Processing.
Analytical Processing.

The ROLAP storage mode The MOLAP storage mode The HOLAP storage mode
causes the aggregation of the principle the aggregations of the connects attributes of both
division to be stored in indexed division and a copy of its source MOLAP and ROLAP. Like MOLAP,
views in the relational database information to be saved in a HOLAP causes the aggregation
that was specified in the multidimensional operation in of the division to be stored in a
partition's data source. analysis services when the multidimensional operation in an
separation is processed.
SQL Server analysis services
instance.

ROLAP does not because a This MOLAP operation is highly HOLAP does not causes a copy of
copy of the source information optimize to maximize query the source information to be
to be stored in the Analysis performance. The storage area can stored. For queries that access
services data folders. Instead, be on the computer where the the only summary record in the
when the outcome cannot be partition is described or on aggregations of a division,
derived from the query cache, another computer running HOLAP is the equivalent of
the indexed views in the record Analysis services. Because a copy MOLAP.
source are accessed to answer of the source information resides
queries. in the multidimensional operation,
queries can be resolved without
accessing the partition's source
record.

Query response is frequently Query response times can be Queries that access source record
slower with ROLAP storage reduced substantially by using for example, if we want to drill
than with the MOLAP or aggregations. The record in the down to an atomic cube cell for
HOLAP storage mode. partition's MOLAP operation is which there is no aggregation
Processing time is also only as current as of the most information must retrieve data
frequently slower with ROLAP. recent processing of the from the relational database and
separation. will not be as fast as they would
be if the source information were
stored in the MOLAP
architecture.

You might also like