Lecture 04
Data Warehouse Architecture
Conceptual Model
Summary – last week
• Last week:
– DW Architecture
– Storage Architecture
– Tier Architecture
• This week:
– Distributed DW
– DW Data Modeling
– Conceptual Model
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 2
Distributed DW
• In most cases the economics and technology
greatly favor a single centra lized DW
• But in some cases,distributed DW make sense
• Types of distributed DW
– Geog raphically distributed
• Local DW instances of global DW
– Technolog ically distributed DW
• Logically one DWbut data is stored on multiple stores
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 3
Distributed DW (cont’d.)
• Geog raphica lly distributed
– In the case of corporations spread around the world
• Information is needed both locally and g loba lly
– A distributed DW makes sense
• When much processing occurs at the local level
• Even though local branches report to
the same balance sheet,the local
organizations are their own companies
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 4
Distributed DW (cont’d.)
All IBM Sybase
Europe Asia
Site A
Site B
Local DW IBM/Teradata
Local DW
Local Local
operational USA
operational
processing
HQ processing
Local DW
Local
operational
processing
Global DW
5
Distributed DW (cont’d.)
• Technolog ica lly distributed DW
– Placing the DW on the distributed technology
of a vendor
– Advantages
• The entry cost is cheap – large centralized
hardware is expensive
• No theoretical limit to how much data can be placed in
the DW – we can add new servers to the network
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 6
Distributed DW (cont’d.)
– As the DW starts to expand network data
com m unication starts playing an important
role
• Example:Let’s simplify and consider we have 4 nodes
holding each data regarding the last 4 years
• Now let’s consider we have a
query which needs to access
the data from the last 4 years: 2008
2007
such a query arises the issue 2006
2005
of transporting large amount
of data between processors
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 7
Data Modeling
• Da ta M odeling / DB Desig n - B a sics
– Is the process of creating a data m odel by analyzing
the requirements needed to support the business
processes of an organization
• It is sometimes called da ta ba se
m odeling /desig n because a data
model is eventually implemented
in a database
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 8
Data Modeling (cont’d.)
• Da ta m odels
– Provide the definition and form at of data
– Graphical representations of the data within a
specific area of interest
• Enterprise Data Model: represents the integrated
data requirements of a complete business
organization
• Subject Area Data Model: Represents the data
requirements of a single business area or
application
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 9
Data Modeling (cont’d.)
Conceptual
• Conceptual Desig n Design
– Transforms data requirements to conceptua l m odel
– Conceptual model describes data entities,relationships,
constraints, etc.on hig h-level Logical
• Does not contain any implementation details Design
• Independent of used software and hardware
• L og ical Desig n (next lecture) Physical
– Maps the conceptual data model to the log ica l da ta m odel Design
used by the DBMS
• e.g.relational model,dimensional model,…
• Technology independent conceptual model is adapted to the
used DB MS software
• P hysical Desig n (2 nd next lecture)
– Creates internal structures needed to efficiently store/manage
data
• Table spaces,indexes,access paths,…
• Depends on used hardware and DBMS software
10
Data Modeling (cont’d.)
• Going from one phase to the next:
• The phase must be complete
– The result serves as input for the next phase
• Often automatic transition is possible with additional
designer feedback
Logical
Design
Physical
ER-diagram, Design
UML, … Tables,
Columns, … Tablespaces,
Indexes, …
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 11
Conceptual Model
• Highest conceptua l grouping of ideas
– Data tends to naturally cluster with data from
the same or similar categories relevant to the
organization
• The m a jor rela tionships between subjects have
been defined
– Least amount of detail
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 12
Conceptual Model (cont’d.)
• Conceptual design
– E ntity-Relationship (E R) Modeling
• Entities -“things” in the real world
– E.g.Car,Account,Product Car Account Product
• Attributes – property of an entity,entity type,or
relationship type Car Color
– E.g.color of a car,balance of an account,price of a product
• Relationships – between entities there can be relationships,
which also can have attributes
– E.g. Person owns Car
Person owns Car
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 13
Conceptual Model (cont’d.)
day of room
week
registration time semester id
number
N N Lecture 1 N
Student attends
instance
teaches Professor
1 1
name
instantiates
name department
title credits
N
id
N
enrolls Lecture
N N
prereq.
part of
N N curriculum
Course of semester
Study
name
14
Conceptual Model (cont’d.)
• Conceptual design in usually done using the
U n ified M odeling L a ng uag e (U M L )
– Class Diagram,Component Diagram,Object
Diagram,Package Diagram…
– For Data Modeling only Class Diag ram s are used
• Entity type becomes cla ss
CLASS NAME
• Relationships become a ssocia tions attribute 1 : domain
…
• There are special types of associations like: attribute n : domain
aggregation,composition,or generalization
operation 1
…
operation m
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 15
Logical Model
• Logical design a rra ng es da ta into a
logical structure
– Which can be mapped into the storage objects
supported by DBMS
• In the case of RDB,the storage objects are tables which
store data in rows and columns Attribute
Tuple
Relation
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 16
Physical Model
• Physical design specifies the physica l
config ura tion of the database on the
storage media
– Detailed specification of:
data elements,data types,
indexing options,and
other parameters
residing in the DBMS
data dictionary
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 17
Data Model in DW
• Managing Complex Data Relationships
– Helps keep track of the complex environment that is
a DW
• Many complex relationships exist,with the ability to change
over time
– Transformations and integration from various systems
of record need to be worked out and maintained
– Provides the means of supplying users with a
roadm ap through the data and relationships
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 18
Conceptual Model in DW
• Modeling business queries
– Goal
• Define the purpose,and decide on the subject(s) for the
data warehouse
Time
Identify questions of interest
– Subject Customers Business Store
• Who bought the products? Model
(customers structure)
• Who sold the product? (store/sales Products
organization structure)
• What was sold? (product structure)
• When was it sold? (time structure)
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 19
Conceptual Model in DW (cont’d.)
• For Conceptua l desig n in DW
conventional techniques like E/R or UML are
not appropriate
– Lack of necessary semantics for modeling the
multidimensional data model
– E/R are constituted to
• Remove redundancy in the data model
• Facilitate retrieval of individual records
– Therefore optimize OLTP
– In the case of DW, however redundancy and
MaterializedViews help speed upAnalytical queries
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 20
Conceptual Model in DW (cont’d.)
• Components
– Facts: a fact is a focus of interest for decision-making,
e.g.,sales,shipments..
– Measures: attributes that describe facts from
different points of view,e.g.,each sale is measured by
its revenue
– Dim ensions: discrete attributes which
determine the granularity adopted to represent
facts,e.g., product,store,date
– Hierarchies: are made up of dimension attributes
• Determine how facts may be aggregated and selected,e.g.,
day – month – quarter - year
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 21
Conceptual Model in DW (cont’d.)
• Conceptual design models for DW
– Multidimensional Entity Relationship (ME /R) Model
– Multidimensional UML (m U ML )
– Other methods e.g., Dimension Fact Model,
Totok approach,etc.
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 22
Multidimensional ER Model
• M E /R M odel
– Its purpose is to create an intuitive representation
of the multidimensional data that is optimized for
high-performance access
– It represents a specialization and evolution of the E/R to
allow specification of m ultidim ensional
sem antics
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 23
Multidimensional ER Model (cont’d.)
• ME/R notation was influenced by the following
considerations
– Specialization of the E/R model
• All new elements of the ME/R have to be specializations of the
E/R elements
• In this way the flexibility and power of expression of the E/R
models are not reduced
– Minimal expansion of the E/R model
• Easy to understand/learn/use: the number of
a dditiona l elem ents should be small
– Representation of the multidimensional semantics
• Although being minimal,it should be powerful enough to be able
to represent m ultidim ensiona l sem a ntics
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 24
Multidimensional ER Model (cont’d.)
• There are 3 main M E /R constructs
– The fact node
– The level node
– A special binary classification edge
Fact Classification level
Characteristics
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 25
Multidimensional ER Model (cont’d.)
• Lets consider a store scena rio designed in E/R
– Entities bear little semantics
– E/R doesn’t support classification and aggregation levels
Package District City Name
Date
1
Is
packed
in n n m
Article is sold Store Is in
n
Belongs
to
Article Nr
1
Product
group
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 26
Multidimensional ER Model (cont’d.)
• M E /R notation:
Prod. Categ Prod. Family Prod. Group Article
Dimension: Geography
Sales
Country Region District City Store
Characteristics
Pre-aggregated sales at country level
So no extra joins are required Week
Year Quarter Month Day
27
Multidimensional ER Model (cont’d.)
• M E /R notation:
– S ales was elected as fact node
– The dimensions are product, geog raphical
area and tim e
– The dimensions are represented Article
Sales
through the so called B asic Store
Characteristics
Classification L evel Day
– Alternative paths in the classification level are also
possible Week
Month Day
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 28
Summary
– Tier Architecture
– Distributed DW
– DW Data Modeling
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 29
Summary (cont’d.)
• DW are usually distributed geographically and
technologically
• Data Modeling – Conceptual Modeling
– In conceptual modeling for DW,conventional
techniques like E/R or UML are not appropriate
– Appropriate methods are:
• Multidimensional Entity Relationship (ME/R) Model
• Multidimensional UML (mUML)
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 30
Next Lecture
• Data Modeling (continued)
– Logical model
– Cubes, Dimensions,
Hierarchies, Classification
Levels
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 31