0% found this document useful (0 votes)

5 views14 pages

DWDM Module 2

The document provides an overview of data warehouses, highlighting their characteristics, architecture, and the distinction between operational databases and data warehouses. It discusses the ETL process, OLAP operations, and various data warehouse models, including enterprise warehouses, data marts, and virtual warehouses. Additionally, it covers data cube aggregation, conceptual modeling, and the design process for data warehouses, emphasizing the importance of metadata and OLAP query processing.

Uploaded by

aaradhytiwari.25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views14 pages

DWDM Module 2

Uploaded by

aaradhytiwari.25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Zettel

DWDM Module 2

Data Warehouse
• a decision support database that is maintained separately from the organization's operational
database. It supports info-processing by providing a platform of consolidated, historical data
for analysis.
• A data warehouse is a
1. subject-oriented : organized around subjects such as customer, products, sales. Focuses on
modelling and analysis of this data and not dailt ops and transactions.
Provide a simple and concise view of these subjects by excluding unnecessary details.
2. integrated : built by combining multiple heterogeneous data sources. Data cleaning and
integration methods are applied to ensure consistency in all naming conventions, encoding,
measures, etc. Data is usually transformed into a conforming format after before integrating
to the warehouse.
3. time-variant : the time horizon for warehouses is much longer than of ops systems. they
provide historical data going back decades while the Ops systems focus on day-to-day
current value. Every warehouse item contains an element of time.
4. non-volatile : a physically separate store of data is transformed from Ops env to form a
warehouse. Operational updates of data do not occur in WHs, no transaction processing,
recovery or concurrency either. The only 2 operations are the initial data-loading and access
of data.
collection of data in support of management's decision making process.

3-Tier Warehouse architecture

Warehouse Development Approach
OLTP - Online Transaction Processing v/s OLAP - Online
Analytical Processing

Why a sperate Warehouse?

• DBMS are tuned for OLTP access methods, recovery, indexing and concurrency.
• WH are tuned for OLAP with complex queries, multidimensional views and data consolidation.
• Decision support requires historical data which is not available in Ops systems
• Decision support via analytics require consolidation of various different data sources which is
only possible in WH via various cleaning, integration and mining methods.
• Different sources typically have inconsistent data types and need to be transformed and
integrated first into a WH to be of any good use.
• Newer systems are now performing OLAP directly on Operation RDBMSs

DATA CUBE Aggregation

Aggregating data into lower levels and into multiple levels of data cubes reduce the size of data to
deal with and these small representations hence speed up query resolution and solution to tasks.
MULTIDIMENSIONAL Data
A single subject may be represented as a
function of multiple other data points.
Sales --> f ( Product, Month, Region)
Hierarchical summarization paths
Data warehouses also allow for multi-linked views of the data stored in them.
DIMENSIONAL TABLES : store subject info such as item (name, brand. type)
FACT TABLES : store measures such as "dollars_sold" and the keys to each of the related dimension
tables.

A n-D base cube is a base cuboid while the O-D cuboid with highest level of sdummarization is
called the apex cuboid. This lattice of cuboids forms a data cube.
CONCEPTUAL MODELING of Data Warehouses
1. Star Schema: A fact table is in the middle;
connected to a set of dimensiona tables.

1. Snowflake Schema : a
refinement of Star. Some
dimensional hierarchy is
normalized into a set of
smaller dimension tables;
forming a shape similar to
a snowflake.

1. Fact constellations : Multiple fact tables

share dimension tables, viewed as
collection of stars hence also called Galaxy
schema.
Concept Hierarchy in Warehouses
organizes concepts hierarchically and is associated with each dimension in a data WH.
This facilitates drilling (higher to lower concepts) and rolling (get less specific, going up the
hierarchy) in data warehouse to view data in multiple levels of granularity.
Recursively reduces low level concepts (ages - 11, 12, 13 ... 99) with higher level concepts ( youth ,
adult ,senior )
These hierarchies can be explicitly stated by domain experts or WH designers
street <- city <- state <- country
or left for automation. Numerical data is automatically set into hierarchies by Discretization
methods. Attribute with most distinct values automatically placed below and the the one with least
is placed atop. so naturally many streets = street at lowest level while countries at top level.
THREE WAREHOUSE MODELS
ENTERPRISE WAREHOUSE : collects all info about the subjects spanning the entire org.
DATA MART : a subset of the corporate-wide data that is of value to a specific group of users. Its
scope is confined to the specific group. For example the marketing department will have a different
data mart than the Engineering department.
These marts can be independent or dependant on the mainwarehouse.
VIRTUAL WAREHOUSE : A set of views over operational DBs. Only few "possible" summary views
can be materialized as this is STILL an Ops system and not a dedicated WH.

EXTRACTION, TRANSFORMATION, LOADING - ETL

1. Data Extraction : get data from multiple sources, heterogeneous in nature.
2. Data Cleaning : detect errors, incomplete points and outliers in th data and rectify.
3. Data Transformation : convert data from inconsistent legacy formats to consistent warehouse
format.
4. Load : sort, summarize, consolidate , aggregate, check integrity, build indices and partitions and
compute possible views / representations.
5. Refresh : propagate the updates from the data sources to the warehouse.

META DATA REPOSITORY

• meta data is the data that defines other data; in this case the warehouse objects.
• It stores the structure of the data WH, schema, views, dimensions. hierarchies, derived data,
marts and contents.
• Operational meta data includes "data lineage" (how the source provided info, how it was
cleaned, transformed and integrated), "currency of data" (active, archived, purged) and
"monitoring info" (usage stats, errors and audits)
• also stors algorithms used for many data miniung tasks like summarization
• mapping from Ops systems and env to data WH
• data related to performance and maintenance
• Business data - terms and definitions, ownership of data and various policies

OLAP OPERATIONS
• Roll Up (drill-up) : climb up the hierarchy; summarize; dimension reduction
• Drill Down (roll down) : climb down the hierarchy; detailed data; dimenions increase
• Pivot : reorient the cube / visualization to 2D planes.
• Slice : selecting a single dimension from a cube, effectively creating a new sub-cube. This
operation reduces the dimensionality of the data by fixing a specific value for one of the
dimensions. ITEM in year 2008 (so time fixed and rest 2 same)
• Dice : selecting specific years and product categories and excluding EVERRYTHING else. this
reduces dimensions and focus on the selected things.
• Drill across : involve more than once fact table
• Drill through : through the bottom of the cube to its actual backend relational tables.
Star-Net Query Model
enhances the traditional star schema by introducing a network of dimensions and supporting
hierarchical relationships, allowing for more complex and flexible data analysis. It is particularly
useful in data warehousing and OLAP systems, where users need to perform multidimensional
queries efficiently and intuitively.
• Central fact table with dimension tables.
• Interconnected dimensions for complex queries.
• Supports drill-down and roll-up analyses.
• Allows multidimensional analysis from various perspectives.
DATA WAREHOUSE Views
1. Top-Down View: Focuses on selecting relevant information necessary for the data warehouse.
2. Data Source View: Exposes information captured, stored, and managed by operational systems.
3. Data Warehouse View: Comprises fact tables and dimension tables.
4. Business Query View: Represents data perspectives from the end-user's viewpoint.

DATA WAREHOUSE Design Process

• Top-Down: Comprehensive design and planning for a mature data warehouse aligned with
business goals.
• Bottom-Up: Rapid development through experiments and prototypes, focusing on immediate
needs.
• Combination: Merges both approaches for flexibility and strategic alignment.
Software Engineering Perspectives:
• Waterfall Model: Structured, step-by-step analysis; each phase completed before the next.
• Spiral Model: Iterative development with quick turnaround; continuous refinement based on
feedback.
Typical Design Process:
1. Choose a Business Process: Identify key processes (e.g., orders, invoices).
2. Choose the Grain: Define the atomic level of data (e.g., individual transactions).
3. Choose the Dimensions: Identify relevant dimensions (e.g., Time, Product, Customer).
4. Choose the Measures: Define measures for fact tables (e.g., sales amount, quantity sold).

DATA WAREHOUSE Usage / Application

1. Information Processing:
• Supports querying, basic statistical analysis, and reporting.
• Utilizes crosstabs, tables, charts, and graphs.

2. Analytical Processing:
• Enables multidimensional analysis of data.
• Supports OLAP operations like slice, dice, drilling, and pivoting.

3. Data Mining:
• Focuses on knowledge discovery from hidden patterns.
• Supports associations, analytical model construction, classification, prediction, and
visualization of results.

From OLAP to OLAM - On Line Analytical Mining

• high quality of data in data warehouses
• Open database connectivity, web accessing, reporting and OLAP tools
• Online selection of data mining functions

Efficient Cube Computation

How many cuboids in an n-dimensional cube with L levels?

Materialization of data cubes can be full, partial or none. It depends on the data's size, sharing and
access freqeuency.

Compute Cubes
Text Text

define cube sales [item, city, year]: SELECT item, city, year, SUM (amount)
sum (sales_in_dollars) FROM SALES
compute cube sales CUBE BY item, city, year

Indexing OLAP Data

• index on a a particular column.
• each value has a bit vector; bit op is fast
• length of bit increases with number of
records so not suitable for high cardinality
data
• Word-Aligned Hybrid (WAH) makes it work
for such cases too now...
• In data warehouses the join index relates
the values of dimensions of a star schema
to a fact table. This speeds up relational
join operations. ex: Sales fact table for 2
dimensions of City and Product.

OLAP Query Processing

Determine what operations (drill, roll, slice, dice...) to perform on the available cuboid views.
Convert these operations into corresponding SQL/OLAP operations.
ROLAP - Relational OLAP
• Use relational or extended-relational DBMS to store and manage WH data and middle ware.
• Include optimization of DBBS backend and implementation of aggregation and navigation logic
along with additional tools and services
• greater scalability

MOLAP - Multidimensional OLAP

• sparse arra-based multidimensional summarized data
• fast-indexing to pre-computed data

HOLAP - Hybrid OLAP

• flexibility; ex: low level : relational, high-level: array.

Specialized SQL Servers

• specialized support for SQL queries over star/snowflake schemas

Attribute-oriented induction of Data in Warehouse

• collect task-relevant data using a relational database query
• perform generalization by
1. Attribute removal : CASE1 - there's a large set of distinct values or there is no concept of
hierarchy for the attribute. CASE2 - its higher level concepts are described in terms of other
attributes; then the attribute should be removed. ex: Customer ID, Order ID, and other similar
key values are all unique.
2. Attribute generalization : if there exists a set of generalization for distinct values of an
attribute then it should be selected and applied. ex: age in (youth, adult, senior) , address in
(state, country) , salaries sizes, etc.

• Apply aggregation by merging identical, genberalized tuples and accumulate their counts.
• Interact with users for knowledge visualization
Suppose that a user wants to describe the general characteristics of graduate students in the Big
University database. Given the attributes name, gender, major, birth place, birth date, residence,
phone#, and gpa.
Text Text

DMQL QUERY TRANSFORMED QUERY

use Big University DB use Big University DB

mine characteristics as “Science select name, gender, major, birth
Students” place, birth date, residence,
in relevance to name, gender, major, phone#, gpa
birth place, birth date, residence, from student where status in
phone#, gpa {“M.Sc.,” “M.A.,” “M.B.A.,” “Ph.D.”}
from student
where status in “graduate”

Data Warehousing and Data Mining: UNIT-1
No ratings yet
Data Warehousing and Data Mining: UNIT-1
118 pages
Data Warehouse
No ratings yet
Data Warehouse
174 pages
04OLAP
100% (1)
04OLAP
58 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
Module-3 Data Warehousing
No ratings yet
Module-3 Data Warehousing
44 pages
Unit 1 - Data Warehouse
No ratings yet
Unit 1 - Data Warehouse
21 pages
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
No ratings yet
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
46 pages
Data Warehousing Essentials
No ratings yet
Data Warehousing Essentials
39 pages
04OLAP Editted v1
No ratings yet
04OLAP Editted v1
59 pages
04DWH & Olap
No ratings yet
04DWH & Olap
50 pages
Warehouse
No ratings yet
Warehouse
58 pages
Chapter 1
No ratings yet
Chapter 1
9 pages
04OLAP
No ratings yet
04OLAP
58 pages
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
No ratings yet
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
46 pages
04OLAP
No ratings yet
04OLAP
35 pages
03 04OLAP SKJ Edited Oct 1, 2024
No ratings yet
03 04OLAP SKJ Edited Oct 1, 2024
93 pages
(2025!04!03) - Data Warehouse - Lecture 3
No ratings yet
(2025!04!03) - Data Warehouse - Lecture 3
41 pages
Wa0020.
No ratings yet
Wa0020.
58 pages
Data Warehouse Concepts & Models
No ratings yet
Data Warehouse Concepts & Models
7 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
48 pages
Unit 2 - Data Science BCA
No ratings yet
Unit 2 - Data Science BCA
20 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
51 pages
Data Warehouse
No ratings yet
Data Warehouse
23 pages
Warehouse
No ratings yet
Warehouse
60 pages
Data Mining 4
No ratings yet
Data Mining 4
59 pages
Chapter 4
No ratings yet
Chapter 4
29 pages
ch4 DW Summary
No ratings yet
ch4 DW Summary
8 pages
Lecture 4 (Dataware Housing)
No ratings yet
Lecture 4 (Dataware Housing)
50 pages
Data Warehousing: Lecturer: Dr. Nguyen Thi Ngoc Anh
No ratings yet
Data Warehousing: Lecturer: Dr. Nguyen Thi Ngoc Anh
23 pages
FDS Unit 2
No ratings yet
FDS Unit 2
21 pages
Chap3-Data Warehousing and OLAP
No ratings yet
Chap3-Data Warehousing and OLAP
67 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
61 pages
Data Warehousing - CH2
No ratings yet
Data Warehousing - CH2
26 pages
Data Warehousing Essentials
No ratings yet
Data Warehousing Essentials
66 pages
DMDW Operations
No ratings yet
DMDW Operations
65 pages
Data Warehousing
100% (1)
Data Warehousing
51 pages
Data Warehousing Guide for IT Students
No ratings yet
Data Warehousing Guide for IT Students
77 pages
Unit 2 Datawarehouse
No ratings yet
Unit 2 Datawarehouse
58 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
41 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
DataMining and Data Warehousing
No ratings yet
DataMining and Data Warehousing
96 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
28 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
04OLAP
No ratings yet
04OLAP
66 pages
Data Warehousing and OLAP Technology For Data Mining: What Is A Data Warehouse?
No ratings yet
Data Warehousing and OLAP Technology For Data Mining: What Is A Data Warehouse?
98 pages
04OLAP
No ratings yet
04OLAP
50 pages
04OLAP
No ratings yet
04OLAP
48 pages
An Overview of Data Warehousing and OLAP Technology What Is Decision Support?
No ratings yet
An Overview of Data Warehousing and OLAP Technology What Is Decision Support?
4 pages
4-Data Warehousing and Integration in Business
No ratings yet
4-Data Warehousing and Integration in Business
39 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
DMW Unit 1
No ratings yet
DMW Unit 1
56 pages
Data Warehousing, Business Analytics and Online Analytical - 1
No ratings yet
Data Warehousing, Business Analytics and Online Analytical - 1
35 pages
Data Warehousing Concepts Guide
No ratings yet
Data Warehousing Concepts Guide
68 pages
2025-Handouts - OLAP - Lecture 1
No ratings yet
2025-Handouts - OLAP - Lecture 1
10 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
Data - Mining - Warehousing Unit I
No ratings yet
Data - Mining - Warehousing Unit I
45 pages
Ran Offline
No ratings yet
Ran Offline
3 pages
Basic Settings For Approval: Short Text
No ratings yet
Basic Settings For Approval: Short Text
27 pages
C79722AA B1017-2 (ANAEROBE) (Pag.32-38)
No ratings yet
C79722AA B1017-2 (ANAEROBE) (Pag.32-38)
191 pages
Case Study For Oligohydramnios
100% (1)
Case Study For Oligohydramnios
8 pages
Site Inspection Report: Robig Builders
No ratings yet
Site Inspection Report: Robig Builders
4 pages
Drug
No ratings yet
Drug
4 pages
Viral Pin Formula-2019
No ratings yet
Viral Pin Formula-2019
62 pages
CN 17 en
No ratings yet
CN 17 en
2 pages
Mypna Se g10 Sel House Taken Web
No ratings yet
Mypna Se g10 Sel House Taken Web
14 pages
Chord Progression
No ratings yet
Chord Progression
7 pages
Fluid Mechanics With Engineering Applications 10th Economy Edition2001 by E. FinnemoreJoseph Franzini
0% (1)
Fluid Mechanics With Engineering Applications 10th Economy Edition2001 by E. FinnemoreJoseph Franzini
5 pages
Dual Blade Skinning Sk11-312
No ratings yet
Dual Blade Skinning Sk11-312
4 pages
5-Hazber-CV-College Level
No ratings yet
5-Hazber-CV-College Level
1 page
The Quantum Software Lifecycle: Benjamin Weder Johanna Barzen Frank Leymann
No ratings yet
The Quantum Software Lifecycle: Benjamin Weder Johanna Barzen Frank Leymann
8 pages
Modelling of Tension-Stiffening in Bending RC Elements Based On Equivalent Stiffness of The Rebar
No ratings yet
Modelling of Tension-Stiffening in Bending RC Elements Based On Equivalent Stiffness of The Rebar
21 pages
The Foundation Engineering Handbook-Chapter3
No ratings yet
The Foundation Engineering Handbook-Chapter3
71 pages
Article Summary Worksheet: Upholding Policies
No ratings yet
Article Summary Worksheet: Upholding Policies
2 pages
Aphindiwe 3
100% (22)
Aphindiwe 3
234 pages
Blood Bank I
No ratings yet
Blood Bank I
136 pages
Harvey and Penzo - Parenting A Child Who Has Intense Emotions
100% (2)
Harvey and Penzo - Parenting A Child Who Has Intense Emotions
225 pages
Firewatch Guidelines for Safety
No ratings yet
Firewatch Guidelines for Safety
1 page
ProMax Troubleshooting Guide
No ratings yet
ProMax Troubleshooting Guide
2 pages
Huawei Network Configuration Guide
No ratings yet
Huawei Network Configuration Guide
50 pages
Cm109 Usb Bluetooth Adaptor Manual
No ratings yet
Cm109 Usb Bluetooth Adaptor Manual
4 pages
3865022734PL ImpactAbsorption
No ratings yet
3865022734PL ImpactAbsorption
8 pages
Gecco V6.2
No ratings yet
Gecco V6.2
1 page
The Boston Photographs Essay Analysis
100% (4)
The Boston Photographs Essay Analysis
2 pages
Customer Segments Ver.2.0
No ratings yet
Customer Segments Ver.2.0
51 pages
D&D Monster: Awakened Snowman
No ratings yet
D&D Monster: Awakened Snowman
1 page
Manual Técnico Autoclave TUTTNAUER 5075
100% (1)
Manual Técnico Autoclave TUTTNAUER 5075
134 pages

DWDM Module 2

Uploaded by

DWDM Module 2

Uploaded by

Zettel

3-Tier Warehouse architecture

Why a sperate Warehouse?

DATA CUBE Aggregation

1. Fact constellations : Multiple fact tables

EXTRACTION, TRANSFORMATION, LOADING - ETL

META DATA REPOSITORY

DATA WAREHOUSE Design Process

DATA WAREHOUSE Usage / Application

From OLAP to OLAM - On Line Analytical Mining

Efficient Cube Computation

Indexing OLAP Data

OLAP Query Processing

MOLAP - Multidimensional OLAP

HOLAP - Hybrid OLAP

Specialized SQL Servers

Attribute-oriented induction of Data in Warehouse

DMQL QUERY TRANSFORMED QUERY

use Big University DB use Big University DB

You might also like