UNIT-1 Data Warehousing
1. Components of Data Warehousing
- Data Sources: These are the origin points of data. They can be databases, flat files, or
external sources.
- Data Staging Area: This area involves ETL (Extract, Transform, Load) processes. Data is
extracted from sources, transformed into a suitable format, and loaded into the data
warehouse.
- Data Storage: This is the central repository where data is stored. It includes relational
databases and multidimensional databases.
- Data Presentation: Tools and interfaces for querying and reporting data.
- Metadata: Data about data. It describes the structure, operations, and contents of the
data warehouse.
- Management and Control: Tools to manage, monitor, and control the data warehouse
operations.
Application: A retail company uses data warehousing to combine data from sales,
inventory, and customer databases to analyze trends and performance.
Diagram:
```
Data Sources -> Data Staging Area -> Data Storage -> Data Presentation
| |
Metadata Management
and Control
```
2. Need for Data Warehousing
- Integration of Data: Combining data from various sources for a unified view.
- Historical Analysis: Storing historical data for trend analysis over time.
- Improved Decision-Making: Providing insights and analytics to support business
decisions.
- Consistency: Ensuring data consistency across different departments.
Example: A healthcare provider uses a data warehouse to integrate patient records,
treatment plans, and billing information to improve patient care and operational efficiency.
3. Building and Interpretation of Data Warehouse
- Requirements Gathering: Understanding business needs and data requirements.
- Data Modeling: Designing the logical and physical structure of the data warehouse.
- ETL Process: Extracting data from sources, transforming it, and loading it into the
warehouse.
- OLAP Tools: Implementing tools for data analysis and reporting.
- Testing and Deployment: Validating the data warehouse and deploying it for use.
Example: A financial institution builds a data warehouse to track transactions, customer
data, and financial reports, providing a comprehensive view of operations.
4. Data Mart
- Definition: A subset of a data warehouse, focused on a specific business line or team.
- Types: Dependent (sourced from a central data warehouse) and Independent
(standalone, not linked to a central warehouse).
Example: A marketing data mart within a retail company might contain sales data,
customer demographics, and campaign performance metrics.
5. Multi-Dimensional Data Model and Related Schemas
- Multi-Dimensional Data Model: Organizes data into dimensions and measures.
Dimensions are perspectives (e.g., time, geography), and measures are numerical data
(e.g., sales, profit).
- Schemas:
- Star Schema: Central fact table connected to dimension tables.
- Snowflake Schema: Similar to star schema but with normalized dimension tables.
- Galaxy Schema: Multiple fact tables sharing dimension tables.
Example: An e-commerce company uses a star schema to analyze sales data by time,
product, and region.
Diagram:
```
Star Schema:
Fact Table -> Dimension Table 1
-> Dimension Table 2
-> Dimension Table 3
```
6. Data Warehouse and OLAP Technology
- Online Analytical Processing (OLAP): Technology that allows users to analyze data from
multiple perspectives.
- OLAP Vs OLTP:
- OLAP (Online Analytical Processing): Focuses on complex queries, historical data
analysis, and decision support. E.g., analyzing sales trends.
- OLTP (Online Transaction Processing): Focuses on day-to-day transaction processing.
E.g., recording a sales transaction.
Different OLAP Operations:
- Slice and Dice: Viewing data from different angles.
- Drill Down/Up: Navigating from summary to detailed data and vice versa.
- Pivot (Rotate): Reorganizing dimensions for different perspectives.
- Roll-up: Aggregating data along a dimension.
Example: A company uses OLAP to drill down from annual sales data to quarterly, monthly,
and daily sales data to identify trends.
Diagram:
```
OLAP Cube:
| Product
Region |----------------- Time
```
Applications of Data Warehousing:
- Retail: Analyzing customer purchasing patterns and inventory management.
- Healthcare: Integrating patient data for better treatment outcomes.
- Finance: Monitoring transactions and financial reporting.
- Telecommunications: Analyzing call records and network usage.
These components and applications demonstrate how data warehousing and OLAP
technologies enable businesses to make informed decisions by leveraging integrated,
historical data for comprehensive analysis.