For exclusive use with:
LECTURE 8 (TEXTBOOK CHAPTER 8 - Data Warehouse
Modeling i.e. Dimensional Modeling)
Copyright (c) 2023-25 Nenad Jukic and Prospect Press
INTRODUCTION
▪ ER modeling
• A predominant technique for visualizing database requirements, used
extensively for conceptual modeling of operational databases
▪ Relational modeling
• Standard method for logical modeling of operational databases
▪ Both of these techniques can also be used during the
development of data warehouses and data marts
▪ Dimensional modeling
• A modeling technique tailored specifically for analytical database design
purposes
• Regularly used in practice for modeling data warehouses and data
marts
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 2
DIMENSIONAL MODELING
▪ Dimensional modeling
• A data design methodology used for designing subject-oriented
analytical databases, such as data warehouses or data marts
• Commonly, dimensional modeling is employed as a relational data
modeling technique
• In addition to using regular relational concepts (primary keys, foreign
keys, integrity constraints, etc.) dimensional modeling distinguishes two
types of tables:
o Dimensions
o Facts
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 3
DIMENSIONAL MODELING
▪ Dimension tables (dimensions)
• Contain descriptions of the business, organization, or enterprise to
which the subject of analysis belongs
• Columns in dimension tables contain descriptive information that is
often textual (e.g., product brand, product color, customer gender,
customer education level), but can also be numeric (e.g., product
weight, customer income level)
• This information provides a basis for analysis of the subject
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 4
DIMENSIONAL MODELING
▪ Fact tables
• Contain measures related to the subject of analysis and the foreign keys
(associating fact tables with dimension tables)
• The measures in the fact tables are typically numeric and are intended
for mathematical computation and quantitative analysis
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 5
DIMENSIONAL MODELING
▪ Star schema
• The result of dimensional modeling is a dimensional schema containing
facts and dimensions
• The dimensional schema is often referred to as the star schema
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 6
DIMENSIONAL MODELING
A dimensional model (star schema)
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 7
Initial Example: Dimensional Model Based on A Single Source
ER diagram : ZAGI Retail Company Sales Department Database (Source)
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 8
Initial Example: Dimensional Model Based on A Single Source
Relational schema : ZAGI Retail Company Sales Department Database (Source)
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 9
Initial Example: Dimensional Model Based on A Single Source
Data records: ZAGI Retail Company Sales Department Database (Source)
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 10
Initial Example: Dimensional Model Based on A Single Source
ZAGI Retail Company dimensional model for the subject sales
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 11
DIMENSIONAL MODELING
▪ Star schema
• In the star schema, the chosen subject of analysis is represented by a
fact table
• Designing the star schema involves considering which dimensions to
use with the fact table representing the chosen subject
• For every dimension under consideration, two questions must be
answered:
o Question 1: Can the dimension table be created based on the existing data
sources?
o Question 2: Can the dimension table be useful for the analysis of the
chosen subject?
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 12
Initial Example: Dimensional Model Based on A Single Source
ZAGI Retail Company dimensional model for the subject sales, populated with the data from
the operational data source
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 13
DIMENSIONAL MODELING
▪ Characteristics of dimensions and facts
• A typical dimension contains relatively static data, while in a typical fact
table, records are added continually, and the table rapidly grows in size.
• In a typical dimensionally modeled analytical database, dimension
tables have orders of magnitude fewer records than fact tables
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 14
DIMENSIONAL MODELING
▪ Surrogate key
• Typically, in a star schema all dimension tables are given a simple, non-
composite system-generated key, also called a surrogate key
• Values for the surrogate keys are typically simple auto-increment integer
values
• Surrogate key values have no meaning or purpose except to give each
dimension a new column that serves as a primary key within the
dimensional model instead of the operational key
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 15
Initial Example: Dimensional Model Based on A Single Source
Example query
Query A: Compare the quantities of sold products on Saturdays in the category
Camping provided by the vendor Pacifica Gear within the Tristate region
between the 1st and 2nd quarter of the year 2020
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 16
Example query - Query A, dimensional version
SELECT SUM(SA.UnitsSold)
‚ P.ProductCategoryName
‚ P.ProductVendorName
‚ C.DayofWeek
‚ C.Qtr
FROM
Calendar C
‚ Store S
‚ Product P
‚ Sales SA
WHERE
C.CalendarKey = SA.CalendarKey
AND S.StoreKey = SA.StoreKey
AND P.ProductKey = SA.ProductKey
AND P.ProductVendorName = 'Pacifica Gear'
AND P.ProductCategoryName = 'Camping'
AND S.StoreRegionName = 'Tristate'
AND C.DayofWeek = 'Saturday'
AND C.Year = 2020
AND C.Qtr IN ( 'Q1', 'Q2' )
GROUP BY
P.ProductCategoryName,
P.ProductVendorName,
C.DayofWeek,
C.Qtr;
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 17
Example query - Query A, nondimensional version
SELECT SUM( I.Quantity )
, C.CategoryName
, V.VendorName
, EXTRACTWEEKDAY(ST.Date)
, EXTRACTQUARTER(ST.Date)
FROM
Region R
, Store S
, SalesTransaction ST
, Includes I
, Product P
, Vendor V
, Category C
WHERE
R.RegionID = S.RegionID
AND S.StoreID = ST.StoreID
AND ST.Tid = SV.Tid
AND I.ProductID = P.ProductID
AND P.VendorID = V.VendorID
AND P.CateoryID = C.CategoryID
AND V.VendorName = 'Pacifica Gear'
AND C.CategoryName = 'Camping'
AND R.RegionName = 'Tristate'
AND EXTRACTWEEKDAY(St.Date) = 'Saturday'
AND EXTRACTYEAR(ST.Date) = 2020
AND EXTRACTQUARTER(ST.Date) IN ( 'Q1', 'Q2' )
GROUP BY
C.CategoryName,
V.VendorName,
EXTRACTWEEKDAY(ST.Date),
EXTRACTQUARTER(ST.Date);
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 18
Expanded Example: Dimensional Model Based on Multiple Sources
ZAGI Retail Company Facilities Department Database (Source 2)
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 19
Expanded Example: Dimensional Model Based on Multiple Sources
Customer Demographic Data Table - external source acquired from a market research
company (Source 3)
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 20
Expanded Example: Dimensional Model Based on Multiple Sources
ZAGI Retail Company dimensional model for the subject sales
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 21
Expanded Example: Dimensional Model Based on Multiple Sources
ZAGI Retail Company dimensional model for the subject sales , populated with the data from
the three sources
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 22
Expanded Example: Dimensional Model Based on Multiple Sources
Example query
Query B: Compare the quantities of sold products to male customers in Modern
stores on Saturdays in the category Camping provided by the vendor Pacifica
Gear within the Tristate region between the 1st and 2nd quarter of the year
2020.
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 23
Example query - Query B, dimensional version
SELECT SUM(SA.UnitsSold)
‚ P.ProductCategoryName
‚ P.ProductVendorName
‚ C.DayofWeek
‚ C.Qtr
FROM
Calendar C
‚ Store S
‚ Product P
, Customer CU
‚ Sales SA
WHERE
C.CalendarKey = SA.CalendarKey
AND S.StoreKey = SA.StoreKey
AND P.ProductKey = SA.ProductKey
AND CU.CustomerKey = SA.CustomerKey
AND P.ProductVendorName = 'Pacifica Gear'
AND P.ProductCategoryName = 'Camping'
AND S.StoreRegionName = 'Tristate'
AND C.DayofWeek = 'Saturday'
AND C.Year = 2020
AND C.Qtr IN ( 'Q1', 'Q2' )
AND S.StoreLayout = 'Modern'
AND CU.Gender = 'Male'
GROUP BY
P.ProductCategoryName,
P.ProductVendorName,
C.DayofWeek,
C.Qtr;
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 24
DIMENSIONAL MODELING
▪ Additional possible fact attributes
• A fact table contains
o Foreign keys connecting the fact table to the dimension tables
o The measures related to the subject of analysis
• In addition to the measures related to the subject of analysis, in certain
cases fact tables can contain other attributes that are not measures
• Two of the most typical additional attributes that can appear in the fact
table are:
o Transaction identifier
o Transaction time
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 25
DIMENSIONAL MODELING
Additional possible fact attributes
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 26
Expanded Example: Dimensional Model with Transaction Identifier
ZAGI Retail Company dimensional model for the subject sales with transaction identifier
included
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 27
Expanded Example: Dimensional Model with Transaction Identifier
Fact table in
the ZAGI Retail
Company
dimensional
model for the
subject sales,
populated with
the data,
including the
transaction
identifier
values
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 28
Expanded Example: Dimensional Model with Transaction Time
ER diagram : Transaction time attribute added to SALESTRANSACTION in the ZAGI Retail
Company Sales Department Database (Source 1).
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 29
Expanded Example: Dimensional Model with Transaction Time
ZAGI Retail Company dimensional model for the subject sales with time included
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 30
Expanded Example: Dimensional Model Based on Multiple Sources
Fact table in
the ZAGI
Retail
Company
dimensional
model for
the subject
sales,
populated
with
the data,
including
the time
values
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 31
DIMENSIONAL MODELING
▪ Multiple facts in a dimensional model
• When multiple subjects of analysis can share the same dimensions, a
dimensional model contains more than one fact table
• A dimensional model with multiple fact tables is referred to as a
constellation or galaxy of stars
• This approach enables:
o Quicker development of analytical databases for multiple subjects of
analysis, because dimensions are re-used instead of duplicated
o Straightforward cross-fact analysis
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 32
Expanded Example: Dimensional Model Based on Multiple Sources
ER diagram : ZAGI Retail Company Quality Control Database (Source 4)
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 33
Expanded Example: Dimensional Model Based on Multiple Sources
Data records: ZAGI Retail Company Quality Control Database (Source 4)
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 34
Expanded Example: Dimensional Model Based on Multiple Sources
ZAGI Retail Company dimensional model for the subjects sales and defects
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 35
Expanded Example: Dimensional Model Based on Multiple Sources
ZAGI Retail Company dimensional model for the subjects sales and defects , populated
with the data from the four sources
Jukić, Vrbsky, Nestorov, Sharma – Database Systems Copyright (c) 2023-25 Nenad Jukic and Prospect Press Chapter 8 – Slide 36