[go: up one dir, main page]

0% found this document useful (0 votes)
95 views31 pages

Data Warehouse Architecture Lecture

This document discusses data warehouse architecture and data modeling concepts. It covers distributed data warehouse architectures, including geographically and technologically distributed warehouses. It then discusses data modeling, including the conceptual, logical, and physical design phases. The conceptual design phase transforms requirements into a conceptual data model using entity-relationship modeling to define entities, attributes, and relationships at a high level.

Uploaded by

hassan499
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views31 pages

Data Warehouse Architecture Lecture

This document discusses data warehouse architecture and data modeling concepts. It covers distributed data warehouse architectures, including geographically and technologically distributed warehouses. It then discusses data modeling, including the conceptual, logical, and physical design phases. The conceptual design phase transforms requirements into a conceptual data model using entity-relationship modeling to define entities, attributes, and relationships at a high level.

Uploaded by

hassan499
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Lecture 04

Data Warehouse Architecture


Conceptual Model
Summary – last week
• Last week:
– DW Architecture
– Storage Architecture
– Tier Architecture

• This week:
– Distributed DW
– DW Data Modeling
– Conceptual Model
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 2
Distributed DW
• In most cases the economics and technology
greatly favor a single centra lized DW
• But in some cases,distributed DW make sense
• Types of distributed DW
– Geog raphically distributed
• Local DW instances of global DW
– Technolog ically distributed DW
• Logically one DWbut data is stored on multiple stores

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 3
Distributed DW (cont’d.)
• Geog raphica lly distributed
– In the case of corporations spread around the world
• Information is needed both locally and g loba lly
– A distributed DW makes sense
• When much processing occurs at the local level
• Even though local branches report to
the same balance sheet,the local
organizations are their own companies

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 4
Distributed DW (cont’d.)
All IBM Sybase
Europe Asia
Site A
Site B
Local DW IBM/Teradata
Local DW
Local Local
operational USA
operational
processing
HQ processing

Local DW
Local
operational
processing

Global DW

5
Distributed DW (cont’d.)
• Technolog ica lly distributed DW
– Placing the DW on the distributed technology
of a vendor
– Advantages
• The entry cost is cheap – large centralized
hardware is expensive
• No theoretical limit to how much data can be placed in
the DW – we can add new servers to the network

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 6
Distributed DW (cont’d.)
– As the DW starts to expand network data
com m unication starts playing an important
role
• Example:Let’s simplify and consider we have 4 nodes
holding each data regarding the last 4 years
• Now let’s consider we have a
query which needs to access
the data from the last 4 years: 2008
2007
such a query arises the issue 2006
2005
of transporting large amount
of data between processors

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 7
Data Modeling
• Da ta M odeling / DB Desig n - B a sics
– Is the process of creating a data m odel by analyzing
the requirements needed to support the business
processes of an organization
• It is sometimes called da ta ba se
m odeling /desig n because a data
model is eventually implemented
in a database

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 8
Data Modeling (cont’d.)
• Da ta m odels
– Provide the definition and form at of data
– Graphical representations of the data within a
specific area of interest
• Enterprise Data Model: represents the integrated
data requirements of a complete business
organization
• Subject Area Data Model: Represents the data
requirements of a single business area or
application

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 9
Data Modeling (cont’d.)
Conceptual
• Conceptual Desig n Design
– Transforms data requirements to conceptua l m odel
– Conceptual model describes data entities,relationships,
constraints, etc.on hig h-level Logical
• Does not contain any implementation details Design
• Independent of used software and hardware
• L og ical Desig n (next lecture) Physical
– Maps the conceptual data model to the log ica l da ta m odel Design
used by the DBMS
• e.g.relational model,dimensional model,…
• Technology independent conceptual model is adapted to the
used DB MS software
• P hysical Desig n (2 nd next lecture)
– Creates internal structures needed to efficiently store/manage
data
• Table spaces,indexes,access paths,…
• Depends on used hardware and DBMS software

10
Data Modeling (cont’d.)
• Going from one phase to the next:
• The phase must be complete
– The result serves as input for the next phase
• Often automatic transition is possible with additional
designer feedback

Logical
Design
Physical
ER-diagram, Design
UML, … Tables,
Columns, … Tablespaces,
Indexes, …

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 11
Conceptual Model
• Highest conceptua l grouping of ideas
– Data tends to naturally cluster with data from
the same or similar categories relevant to the
organization
• The m a jor rela tionships between subjects have
been defined
– Least amount of detail

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 12
Conceptual Model (cont’d.)

• Conceptual design
– E ntity-Relationship (E R) Modeling
• Entities -“things” in the real world
– E.g.Car,Account,Product Car Account Product
• Attributes – property of an entity,entity type,or
relationship type Car Color
– E.g.color of a car,balance of an account,price of a product
• Relationships – between entities there can be relationships,
which also can have attributes
– E.g. Person owns Car
Person owns Car

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 13
Conceptual Model (cont’d.)
day of room
week
registration time semester id
number
N N Lecture 1 N
Student attends
instance
teaches Professor
1 1
name
instantiates
name department

title credits
N
id
N
enrolls Lecture
N N
prereq.

part of
N N curriculum
Course of semester
Study
name

14
Conceptual Model (cont’d.)

• Conceptual design in usually done using the


U n ified M odeling L a ng uag e (U M L )
– Class Diagram,Component Diagram,Object
Diagram,Package Diagram…
– For Data Modeling only Class Diag ram s are used
• Entity type becomes cla ss
CLASS NAME
• Relationships become a ssocia tions attribute 1 : domain

• There are special types of associations like: attribute n : domain
aggregation,composition,or generalization
operation 1

operation m

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 15
Logical Model

• Logical design a rra ng es da ta into a


logical structure
– Which can be mapped into the storage objects
supported by DBMS
• In the case of RDB,the storage objects are tables which
store data in rows and columns Attribute

Tuple

Relation

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 16
Physical Model

• Physical design specifies the physica l


config ura tion of the database on the
storage media
– Detailed specification of:
data elements,data types,
indexing options,and
other parameters
residing in the DBMS
data dictionary

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 17
Data Model in DW

• Managing Complex Data Relationships


– Helps keep track of the complex environment that is
a DW
• Many complex relationships exist,with the ability to change
over time
– Transformations and integration from various systems
of record need to be worked out and maintained
– Provides the means of supplying users with a
roadm ap through the data and relationships

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 18
Conceptual Model in DW
• Modeling business queries
– Goal
• Define the purpose,and decide on the subject(s) for the
data warehouse
Time
Identify questions of interest
– Subject Customers Business Store

• Who bought the products? Model

(customers structure)
• Who sold the product? (store/sales Products

organization structure)
• What was sold? (product structure)
• When was it sold? (time structure)

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 19
Conceptual Model in DW (cont’d.)
• For Conceptua l desig n in DW
conventional techniques like E/R or UML are
not appropriate
– Lack of necessary semantics for modeling the
multidimensional data model
– E/R are constituted to
• Remove redundancy in the data model
• Facilitate retrieval of individual records
– Therefore optimize OLTP
– In the case of DW, however redundancy and
MaterializedViews help speed upAnalytical queries

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 20
Conceptual Model in DW (cont’d.)
• Components
– Facts: a fact is a focus of interest for decision-making,
e.g.,sales,shipments..
– Measures: attributes that describe facts from
different points of view,e.g.,each sale is measured by
its revenue
– Dim ensions: discrete attributes which
determine the granularity adopted to represent
facts,e.g., product,store,date
– Hierarchies: are made up of dimension attributes
• Determine how facts may be aggregated and selected,e.g.,
day – month – quarter - year

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 21
Conceptual Model in DW (cont’d.)
• Conceptual design models for DW
– Multidimensional Entity Relationship (ME /R) Model
– Multidimensional UML (m U ML )
– Other methods e.g., Dimension Fact Model,
Totok approach,etc.

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 22
Multidimensional ER Model
• M E /R M odel
– Its purpose is to create an intuitive representation
of the multidimensional data that is optimized for
high-performance access
– It represents a specialization and evolution of the E/R to
allow specification of m ultidim ensional
sem antics

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 23
Multidimensional ER Model (cont’d.)
• ME/R notation was influenced by the following
considerations
– Specialization of the E/R model
• All new elements of the ME/R have to be specializations of the
E/R elements
• In this way the flexibility and power of expression of the E/R
models are not reduced
– Minimal expansion of the E/R model
• Easy to understand/learn/use: the number of
a dditiona l elem ents should be small
– Representation of the multidimensional semantics
• Although being minimal,it should be powerful enough to be able
to represent m ultidim ensiona l sem a ntics

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 24
Multidimensional ER Model (cont’d.)

• There are 3 main M E /R constructs


– The fact node
– The level node
– A special binary classification edge

Fact Classification level

Characteristics

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 25
Multidimensional ER Model (cont’d.)
• Lets consider a store scena rio designed in E/R
– Entities bear little semantics
– E/R doesn’t support classification and aggregation levels

Package District City Name


Date
1
Is
packed
in n n m
Article is sold Store Is in
n
Belongs
to
Article Nr
1
Product
group

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 26
Multidimensional ER Model (cont’d.)

• M E /R notation:

Prod. Categ Prod. Family Prod. Group Article

Dimension: Geography

Sales
Country Region District City Store
Characteristics
Pre-aggregated sales at country level
So no extra joins are required Week

Year Quarter Month Day

27
Multidimensional ER Model (cont’d.)
• M E /R notation:
– S ales was elected as fact node
– The dimensions are product, geog raphical
area and tim e
– The dimensions are represented Article

Sales
through the so called B asic Store
Characteristics
Classification L evel Day

– Alternative paths in the classification level are also


possible Week
Month Day

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 28
Summary
– Tier Architecture
– Distributed DW
– DW Data Modeling

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 29
Summary (cont’d.)

• DW are usually distributed geographically and


technologically
• Data Modeling – Conceptual Modeling
– In conceptual modeling for DW,conventional
techniques like E/R or UML are not appropriate
– Appropriate methods are:
• Multidimensional Entity Relationship (ME/R) Model
• Multidimensional UML (mUML)

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 30
Next Lecture

• Data Modeling (continued)


– Logical model
– Cubes, Dimensions,
Hierarchies, Classification
Levels

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 31

You might also like