Chapter 2
Data Warehouse
Modelling
Amol D. Vibhute (PhD)
Assistant Professor
Email ID:- amol.vibhute@sicsr.ac.in
Roadmap of Chapter:
• Introduction
• Dimension Modeling,
• Data Warehouse Schemas,
• Cube Construction and Computation and Data Generalization,
• Dimension Model Creation Case Studies.
Tuesday, February 4, 2025 Dr. Amol 2
Introduction:
• Data warehouse modeling is the process of designing the schemas of the detailed and summarized information of the
data warehouse.
• The goal of data warehouse modeling is to develop a schema describing the reality, or at least a part of the fact, which
the data warehouse is needed to support.
• Data warehouse modeling is an essential stage of building a data warehouse for two main reasons.
– Firstly, through the schema, data warehouse clients can visualize the relationships among the warehouse data, to use them wit h greater ease.
– Secondly, a well-designed schema allows an effective data warehouse structure to emerge, to help decrease the cost of implementi ng the warehouse
and improve the efficiency of using it.
• Data modeling in data warehouses is different from data modeling in operational database systems.
• The primary function of data warehouses is to support DSS processes. Thus, the objective of data warehouse modeling
is to make the data warehouse efficiently support complex queries on long term information.
• The data warehouses are designed for the customer with general information knowledge about the enterprise.
Tuesday, February 4, 2025 Dr. Amol 3
Cont.…
• Metadata is the final element of the data warehouses and is really of various dimensions
in which it is not the same as file drawn from the operational data, but it is used as:-
– A directory to help the DSS investigator locate the items of the data warehouse.
– A guide to the mapping of record as the data is changed from the operational data to the data warehouse
environment.
– A guide to the method used for summarization between the current, accurate data and the lightly
summarized information and the highly summarized data, etc.
Tuesday, February 4, 2025 Dr. Amol 4
Data Modelling Life Cycle:
• It is a straight forward process of transforming the business requirements to fulfill the goals
for storing, maintaining, and accessing the data within IT systems. The result is a logical
and physical data model for an enterprise data warehouse.
• The objective of the data modeling life cycle is primarily the creation of a storage area for
business information. That area comes from the logical and physical data modeling
stages, as shown in Figure:
Tuesday, February 4, 2025 Dr. Amol 5
Cont.…
• Three levels/techniques of data model: Conceptual, Logical and Physical
Data Model.
• Conceptual Data Model: Helps to describe the semantics (meaning) of a domain.
– A conceptual data model recognizes the highest-level relationships between the different entities.
– Characteristics of the conceptual data model
• It contains the essential entities and the relationships among them.
• No attribute is specified.
• No primary key is specified.
– We can see that the only data shown via the conceptual data model is the entities that define the data and the relationships
between those entities. No other data, as shown through the conceptual data model.
Tuesday, February 4, 2025 Dr. Amol 6
Cont.…
• Logical Data Model:
– A logical data model defines the information in as much structure as possible, without observing how
they will be physically achieved in the database. The primary objective of logical data modeling is to
document the business data structures, processes, rules, and relationships by a single view - the logical
data model.
– Features of a logical data model
• It involves all entities and relationships among them.
• All attributes for each entity are specified.
• The primary key for each entity is stated.
• Referential Integrity is specified (FK Relation).
– The phase for designing the logical data model which are as follows:
• Specify primary keys for all entities.
• List the relationships between different entities.
• List all attributes for each entity.
• Normalization.
• No data types are listed
Tuesday, February 4, 2025 Dr. Amol 7
Cont.…
• Physical Data Model:
– Physical data model describes how the model will be presented in the database. A physical database
model demonstrates all table structures, column names, data types, constraints, primary key, foreign key,
and relationships between tables. The purpose of physical data modeling is the mapping of the logical
data model to the physical structures of the RDBMS system hosting the data warehouse. This contains
defining physical RDBMS structures, such as tables and data types to use when storing the information.
It may also include the definition of new data structures for enhancing query performance.
– Characteristics of a physical data model
• Specification all tables and columns.
• Foreign keys are used to recognize relationships between tables.
– The steps for physical data model design which are as follows:
• Convert entities to tables.
• Convert relationships to foreign keys.
• Convert attributes to columns.
Tuesday, February 4, 2025 Dr. Amol 8
Cont.…
• Types of Data Warehouse Models: Enterprise Warehouse, Data Mart, and Virtual
Warehouses.
– An Enterprise warehouse collects all of the records about subjects spanning the entire organization. It
supports corporate-wide data integration, usually from one or more operational systems or external data
providers, and it's cross-functional in scope. It generally contains detailed information as well as
summarized information and can range in estimate from a few gigabyte to hundreds of gigabytes,
terabytes, or beyond. Requires mainframe or supercomputers.
– A data mart (Independent and dependent) includes a subset of corporate-wide data that is of value to
a specific collection of users. The scope is confined to particular selected subjects. For example, a
marketing data mart may restrict its subjects to the customer, items, and sales. The data contained in the
data marts tend to be summarized.
– Virtual Data Warehouses is a set of perception over the operational database. For effective query
processing, only some of the possible summary vision may be materialized. A virtual warehouse is
simple to build but required excess capacity on operational database servers.
Tuesday, February 4, 2025 Dr. Amol 9
Dimensional Data Modelling:
• A set of guidelines to design database table structure for easier and faster data retrieved.
• Dimensional data modeling is a technique used in data warehousing to organize and
structure data in a way that makes it easy to analyze and understand.
• In a dimensional data model, data is organized into dimensions and facts.
• Overall, dimensional data modeling is an effective technique for organizing and
structuring data in a data warehouse for analysis and reporting.
• By providing a simple and intuitive structure for the data, the dimensional model makes it
easy for users to access and understand the data they need to make informed business
decisions.
• The concept of Dimensional Modeling was developed by Ralph Kimball (in 1996) which is
comprised of facts and dimension tables. Since the main goal of this modeling is to
improve the data retrieval so it is optimized for SELECT OPERATION.
• The advantage of using this model is that we can store data in such a way that it is easier
to store and retrieve the data once stored in a data warehouse.
• The dimensional model is the data model used by many OLAP systems.
Tuesday, February 4, 2025 Dr. Amol 10
Cont.…
• Steps to Create Dimensional Data Modeling:
– Step-1: Identifying the business objective:
• The first step is to identify the business objective. Sales, HR, Marketing, etc. are some examples of the need of
the organization. Since it is the most important step of Data Modelling the selection of business objectives also
depends on the quality of data available for that process.
– Step-2: Identifying Granularity:
• Granularity is the lowest level of information stored in the table. The level of detail for business problems and its
solution is described by Grain.
– Step-3: Identifying Dimensions and their Attributes:
• Dimensions are objects or things. Dimensions categorize and describe data warehouse facts and measures in a
way that supports meaningful answers to business questions. A data warehouse organizes descriptive attributes
as columns in dimension tables. For Example, the data dimension may contain data like a year, month, and
weekday.
– Step-4: Identifying the Fact:
• The measurable data is held by the fact table. Most of the fact table rows are numerical values like price or cost
per unit, etc.
– Step-5: Building of Schema:
• We implement the Dimension Model in this step. A schema is a database structure. There are two popular
schemes: Star Schema and Snowflake Schema.
Tuesday, February 4, 2025 Dr. Amol 11
Cont.…
• Elements/components of dimensional data model:
– Facts:
• Facts are the measurable data elements that represent the business metrics of interest. For example, in a sales data
warehouse, the facts might include sales revenue, units sold, and profit margins. Each fact is associated with one or more
dimensions, creating a relationship between the fact and the descriptive data.
– Dimension:
• Dimensions are the descriptive data elements that are used to categorize or classify the data. For example, in a sales data
warehouse, the dimensions might include product, customer, time, and location. Each dimension is made up of a set of
attributes that describe the dimension. For example, the product dimension might include attributes such as product name,
product category, and product price.
– Attributes:
• Characteristics of dimension in data modeling are known as attributes. These are used to filter, search facts, etc. For a
dimension of location, attributes can be State, Country, Zipcode, etc.
– Fact Table:
• In a dimensional data model, the fact table is the central table that contains the measures or metrics of interest,
surrounded by the dimension tables that describe the attributes of the measures. The dimension tables are related to the fact
table through foreign key relationships
– Dimension Table:
• Dimensions of a fact are mentioned by the dimension table and they are basically joined by a foreign key. Dimension
tables are simply de-normalized tables. The dimensions can be having one or more relationships.
Tuesday, February 4, 2025 Dr. Amol 12
Dimensional Data Modelling techniques/schemas:
Star Schema, Snowflake Schema, and Fact Constellation Schema
• Star Schema:
– A star schema is a type of data modeling technique used in data warehousing to represent data in a
structured and intuitive way. In a star schema, data is organized into a central fact table that contains the
measures of interest, surrounded by dimension tables that describe the attributes of the measures.
– The fact table in a star schema contains the measures or metrics that are of interest to the user or
organization. For example, in a sales data warehouse, the fact table might contain sales revenue, units
sold, and profit margins. Each record in the fact table represents a specific event or transaction, such as a
sale or order.
– The dimension tables in a star schema contain the descriptive attributes of the measures in the fact table.
These attributes are used to slice and dice the data in the fact table, allowing users to analyze the data
from different perspectives. For example, in a sales data warehouse, the dimension tables might include
product, customer, time, and location.
– Star schema is the fundamental schema among the data mart schema and it is simplest. This schema is
widely used to develop or build a data warehouse and dimensional data marts. It includes one or more
fact tables indexing any number of dimensional tables. The star schema is a necessary cause of the
snowflake schema. It is also efficient for handling basic queries.
Tuesday, February 4, 2025 Dr. Amol 13
Cont.…
• Star Schema:
– In the figure, SALES is a fact table having attributes i.e.
(Product ID, Order ID, Customer ID, Employer ID, Total,
Quantity, Discount) which references to the dimension
tables.
– Employee dimension table contains the attributes:
Emp ID, Emp Name, Title, Department and Region.
– Product dimension table contains the attributes:
Product ID, Product Name, Product Category, Unit Price.
– Customer dimension table contains the attributes:
Customer ID, Customer Name, Address, City, Zip.
– Time dimension table contains the attributes: Order
ID, Order Date, Year, Quarter, Month.
Tuesday, February 4, 2025 Dr. Amol 14
Case Study_1:
Tuesday, February 4, 2025 Dr. Amol 15
Case Study_2:
Tuesday, February 4, 2025 Dr. Amol 16
Cont.…
• Advantages of Star Schema:
– Simpler Queries –
• Join logic of star schema is quite cinch in comparison to other join logic which are needed to fetch data from a transactiona l
schema that is highly normalized.
– Simplified Business Reporting Logic –
• In comparison to a transactional schema that is highly normalized, the star schema makes simpler common business reporting
logic, such as of reporting and period-over-period.
– Feeding Cubes –
• Star schema is widely used by all OLAP systems to design OLAP cubes efficiently. In fact, major OLAP systems deliver a
ROLAP mode of operation which can use a star schema as a source without designing a cube structure.
• Features:
– Central fact table
– Dimension tables
– Denormalized structure
– Aggregated data
– Fast performance
– Easy to understand
Tuesday, February 4, 2025 Dr. Amol 17
Cont.…
• Snowflake Schema:
– The snowflake schema is a variant of the star schema. Here, the centralized fact table is connected to
multiple dimensions. In the snowflake schema, dimensions are present in a normalized form in
multiple related tables.
– Snowflaking is a method of normalizing the dimension tables in a STAR schemas. When we
normalize all the dimension tables entirely, the resultant structure resembles a snowflake with the fact
table in the middle.
– Snowflaking is used to develop the performance of specific queries. The schema is diagramed with
each fact surrounded by its associated dimensions, and those dimensions are related to other
dimensions, branching out into a snowflake pattern.
– In a snowflake schema, the fact table is still located at the center of the schema, surrounded by many
dimension tables. However, each dimension table is further broken down into multiple related
tables, creating a hierarchical structure that resembles a snowflake.
– Tables in a snowflake schema are generally normalized to the third normal form. Each dimension table
performs exactly one level in a hierarchy.
Tuesday, February 4, 2025 Dr. Amol 18
Cont.…
• The snowflake design is the result of further expansion and normalization of the dimension
table.
• In other words, a dimension table is said to be snowflaked if the low-cardinality attribute of
the dimensions has been divided into separate normalized tables.
• These tables are then joined to the original dimension table with referential constraints
(foreign key constrain).
• Generally, snowflaking is not recommended in the dimension table, as it hampers the
understandability and performance of the dimension model as more tables would be
required to be joined to satisfy the queries.
Tuesday, February 4, 2025 Dr. Amol 19
Cont.…
• For Example, in a sales data warehouse, the product
dimension table might be normalized into multiple related
tables, such as product category, product subcategory, and
product details. Each of these tables would be related to the
product dimension table through a foreign key relationship.
• The Employee dimension table now contains the attributes:
EmployeeID, EmployeeName, DepartmentID, Region, and
Territory. The DepartmentID attribute links with the
Employee table with the Department dimension table. The
Department dimension is used to provide detail about each
department, such as the Name and Location of the
department. The Customer dimension table now contains
the attributes: CustomerID, CustomerName, Address, and
CityID. The CityID attributes link the Customer dimension
table with the City dimension table. The City dimension
table has details about each city such as city name,
Zipcode, State, and Country.
Tuesday, February 4, 2025 Dr. Amol 20
Cont.…
• Characteristics of Snowflake Schema:
– The snowflake schema uses small disk space.
– It is easy to implement the dimension that is added to the schema.
– There are multiple tables, so performance is reduced.
– The dimension table consists of two or more sets of attributes that define information at different grains.
– The sets of attributes of the same dimension table are populated by different source systems.
• Features of the Snowflake Schema:
– Normalization
– Hierarchical Structure
– Multiple Levels
– Joins
– Scalability
Tuesday, February 4, 2025 Dr. Amol 21
Cont.…
• Advantages of Snowflake Schema:
– It provides structured data which reduces the problem of data integrity.
– It uses small disk space because data are highly structured.
• Disadvantages of Snowflake Schema:
– Snowflaking reduces space consumed by dimension tables but compared with the entire data warehouse
the saving is usually insignificant.
– Avoid snowflaking or normalization of a dimension table, unless required and appropriate.
– Do not snowflake hierarchies of dimension table into separate tables. Hierarchies should belong to the
dimension table only and should never be snowflakes.
– Multiple hierarchies that can belong to the same dimension have been designed at the lowest possible
detail.
Tuesday, February 4, 2025 Dr. Amol 22
Case Study_3:
Tuesday, February 4, 2025 Dr. Amol 23
Cont.…
• Difference Between Snowflake and Star Schema:
– The main difference between star schema and snowflake schema is that the dimension table of the
snowflake schema is maintained in the normalized form to reduce redundancy. The advantage here is
that such tables (normalized) are easy to maintain and save storage space. However, it also means that
more joins will be needed to execute the query. This will adversely impact system performance.
– However, the snowflake schema can also be more complex to query than a star schema because it
requires more table joins. This can result in slower query response times and higher resource usage in
the database. Additionally, the snowflake schema can be more difficult to understand and maintain
because of the increased complexity of the schema design.
– The decision to use a snowflake schema versus a star schema in a data warehousing project will depend
on the specific requirements of the project and the trade-offs between query performance, schema
complexity, and data integrity.
Tuesday, February 4, 2025 Dr. Amol 24
What is OLAP (online analytical processing)?
• OLAP, or online analytical processing, is technology for performing high-speed complex queries or
multidimensional analysis on large volumes of data in a data warehouse, data lake or other data repository.
• OLAP is used in business intelligence (BI), decision support, and a variety of business forecasting and
reporting applications.
• Most business data have multiple dimensions—multiple categories into which the data are broken down for
presentation, tracking, or analysis. For example, sales figures might have several dimensions related to
location (region, country, state/province, store), time (year, month, week, day), product (clothing,
men/women/children, brand, type), and more.
• But in a data warehouse or data lake, data sets are stored in tables, each of which can organize data into
just two of these dimensions at a time. OLAP extracts data from multiple relational data sets and reorganizes
it into a multidimensional format that enables very fast processing and very insightful analysis.
Tuesday, February 4, 2025 Dr. Amol 25
OLAP cube:
• The OLAP Cube consists of numeric facts called measures
which are categorized by dimensions. OLAP Cube is also
called the hypercube.
• Usually, data operations and analysis are performed using the
simple spreadsheet, where data values are arranged in row and
column format. This is ideal for two-dimensional data. However,
OLAP contains multidimensional data, with data usually
obtained from a different and unrelated source. Using a
spreadsheet is not an optimal option. The cube can store and
analyze multidimensional data in a logical and orderly manner.
• How does it work?
– A Data warehouse would extract information from multiple data sources and formats
like text files, excel sheet, multimedia files, etc.
– The extracted data is cleaned and transformed. Data is loaded into an OLAP server
(or OLAP cube) where information is pre-calculated in advance for further analysis.
– OLAP have the ability to analyze large amount of data for the extraction of valuable
information. Analytical development can be of business, education or medical
sectors. The technologies of data warehouse, OLAP, and analyzing tools support
that ability.
Tuesday, February 4, 2025 Dr. Amol 26
Cont.…
• OLAP Operations:
– Drill down:
• In drill-down operation, the less
detailed data is converted into
highly detailed data.
• It can be done by:
– Moving down in the concept hierarchy
– Adding a new dimension
– In the cube given in overview section,
the drill down operation is performed
by moving down in the concept
hierarchy of Time dimension (Quarter -
> Month).
Tuesday, February 4, 2025 Dr. Amol 27
Cont.…
• Roll (drill) up: Roll-up is also known as
“consolidation” or “aggregation.” The Roll-up
operation can be performed in 2 ways
– Reducing dimensions
– Climbing up concept hierarchy. Concept
hierarchy is a system of grouping things
based on their order or level.
• In this example, cities New jersey and Lost
Angles and rolled up into country USA
• The sales figure of New Jersey and Los Angeles
are 440 and 1560 respectively. They become
2000 after roll-up
• In this aggregation process, data is location
hierarchy moves up from city to the country.
• In the roll-up process at least one or more
dimensions need to be removed. In this
example, Cities dimension is removed.
Tuesday, February 4, 2025 Dr. Amol 28
Cont.…
– Slice:
• Here, one dimension is
selected, and a new sub-cube
is created.
• Given diagram explain how
slice operation performed.
• Dimension Time is Sliced with
Q1 as the filter.
• A new cube is created
altogether.
Tuesday, February 4, 2025 Dr. Amol 29
Cont.…
• Dice:
– This operation is similar to a
slice. The difference in dice
is you select 2 or more
dimensions that result in the
creation of a sub-cube.
Tuesday, February 4, 2025 Dr. Amol 30
Cont.…
– Pivots:
• In Pivot, you rotate the data
axes to provide a substitute
presentation of data.
• In the given example, the
pivot is based on item types.
Tuesday, February 4, 2025 Dr. Amol 31
Thank You !!!
Tuesday, February 4, 2025 Dr. Amol 32