[go: up one dir, main page]

0% found this document useful (0 votes)
20 views94 pages

DW Presentation Logic

The document discusses data warehousing and business intelligence. It provides an overview of data warehousing including what it is, why it is used, and its key characteristics. It also discusses dimensional data warehousing and different database models including relational, entity-relationship, and network models.

Uploaded by

SHREY UPADHYAYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views94 pages

DW Presentation Logic

The document discusses data warehousing and business intelligence. It provides an overview of data warehousing including what it is, why it is used, and its key characteristics. It also discusses dimensional data warehousing and different database models including relational, entity-relationship, and network models.

Uploaded by

SHREY UPADHYAYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

Data Warehous ing and

Bus ines s Intelligence

ELECTIVE COURSE FOR B E IN COMPUTER ENGINEERING


01

Overview
Data Warehous e

What Is Data Warehousing And Business Intelligence?


The process of storing all the company's data in internal or external databases from various sources
with the focus on analysis and generating actionable insights through online BI tools

Data Warehouse (DWH) COLLECT INTEGRATE STORE ANALYZE DISTRIBUTE REACT

• Data repository for analytical


purpose CRM
Integration
Data
Layer
Warehouse Reporting
ERP
• Data fed through heterogeneous Overnight Batch
Applications
and BI tools Dashboards Data-Driven
SAS
Sources for (AD HOC) decisions
analysis
IMS
• Not all data but only needed data ETL Engines Data marts Embedding

APIs • Extract
• Transform
• Load
Sources

BI Architecture Diagram
3
Why Warehouse?

Data Warehous e
• Why can’t we just go straight to the data source?
• Optimized for analytical reporting
• Historical data
• Multiple integrated sources
• Many years of history
• Aggregated or summarized

• Separates analysis workload from transaction workload


• Performance enhancements tailored to reporting needs
• User-friendly
• Reusable report metadata
• Report creation doesn’t require coding

4
Characteristics of Data Warehouse

• Subject-oriented
• Data is categorized and stored by business subject
• Integrated
• Data on a given subject is collected from multiple sources and stored in a
single place
• Non-volatile
• Data in DW is not updated or deleted
• Time-variant
• Data is stored as a series of snapshots, each representing a period of time

5
WHY BI?

Bus ines s Intelligence (BI)


combines business analytics, data mining,
data visualization, data tools and
infrastructure, and best practices to help
organizations to make more data-driven
decisions.

DWH and BI
• One without the other wouldn’t function.
• The data warehouse enhances the BI processes.

6
BI Tools & Technology

Languages

ETL and
Data
Integration

Database
Platform

BI OLAP
Tools
02

Dimens ional Data


Warehous e
Databas e models

• A database model is a type of data models that determines the logical structure of
a database.
• A collection of tools for describing
• data
• data relationships
• data semantics
• data constraints
Database models

A data model consists of notations for


expressing
• data structures
• integrity constraints
• operations
Databas e Models

• Relational Data Model designs the data in the form of rows and columns within a table. Thus, a relational model
uses tables for representing data and in-between relationships. Tables are also called relations.
• An ER model is the logical representation of data as objects and relationships among them. These objects are
known as entities, and relationship is an association among these entities. A set of attributes describe the entities.
For example, student_name, student_id describes the 'student' entity.
• Object-based Data Model is an extension of the ER model with notions of functions, encapsulation, and object
identity, as well. This model supports a rich type system that includes structured and collection types.
• Semistructured Data Model: This type of data model is different from the other three data models (explained
above). The semistructured data model allows the data specifications at places where the individual data items of
the same type may have different attributes sets. The Extensible Markup Language, also known as XML, is widely
used for representing the semistructured data.
Data Models

• Other Types
• Hierarchical data model: Oldest form in tree structure. In this model, files are related in
a parent/child manner.

• Network data model: obsolete and being replaced by Relational model. In this model,
files are related as owners and members, like to the common network model.
• The Network Model in DBMS is a hierarchical model that is used to represent the
1:M/M:M relationship among the database constraints. It is represented in the form of a
graph hence it is a simple and easy-to-construct database model.
Why Us e Models ?

• Models can be useful when we want to examine or manage part of the real world
• The costs of using a model are often considerably lower than the costs of using or
experimenting with the real world itself
EXAMPLES

• airplane simulator
• nuclear power plant simulator
• flood warning system
• model of economy
• map
Network model

• The network model was created to represent complex data


relationships more effectively when compared to hierarchical
models, to improve database performance and standards.
• It has entities which are organized in a graphical representation and
some entities are accessed through several paths. A User perceives
the network model as a collection of records in 1:M/M:M
relationships.

• Features of a Network Model −


• Ability to Merge Relationships − In this model, because of more
relationships the data is more related. It has an ability to manage
one-to-one relationships as well as many-to-many relationships.
• Many paths − There can be more than one path to the same
record because of more relationships. It makes data access fast and
simple.
• Circular Linked List − The operations in this model are done with
the help of the circular linked list. The current position is maintained
with the help of a program and navigates through the records
based on relationships.
Entity-Relationship Model

Example of schema in the entity-relationship model


Component of ER Model
ER-Model

• The ER- Model is extremely successful as a database design model


• Translation algorithms to many data models
• Commercial database design tools
• No database system is based on the model
ER-Model

• E-R model of real world


• Entities (objects)
• E.g. customers, accounts, bank branch
• Relationships between entities
• Crow Feet Notation in ER Model – very succes sful in ERD

• Widely used for database design


• Database design in E-R model usually converted to design in the relational model which
is used for storage and processing
Crow’s Foot Notation
Relational Model

• Relational model can represent as a table with columns and rows. Each row is known as a
tuple. Each table of the column has a name or attribute.

• Domain: It contains a set of atomic values that an attribute can take.


• Attribute: It contains the name of a column in a particular table. Each attribute Ai must
have a domain, dom(Ai)
• Relational instance: In the relational database system, the relational instance is
represented by a finite set of tuples. Relation instances do not have duplicate tuples.
• Relational schema: A relational schema contains the name of the relation and name of all
columns or attributes.
• Relational key: In the relational key, each row has one or more attributes. It can identify
the row in the relation uniquely.
Relational Model - Operations

• Powerful set-oriented query languages


• Insert, delete, and update capabilities
• Relational Algebra: procedural language; describes how to compute a
query (step by step process); uses operators like JOIN, SELECT,
PROJECT,UNION, INTERSECTION
• Relational Calculus : Alternative way for formulating queries- non
procedural language; declarative; describes the desired result; Explains
what to do but not how to do :-E.g: SQL, QBE .
• Join Operations: Outer, Inner, Equi
• Integrity Constraints: Set of rules to maintain quality; DML and other
operations have to performed such that integrity in not affected. Eg:
Domain constraint- Data types like int,string, double,time; Entity Integrity
constraint – PK and rules for pk; Referential Integrity- FK;
Data vs. Information

• Data:
• Raw facts; building blocks of information
• Unprocessed information
• Information:
• Processed data to reveal meaning
Data vs. Information
Warehouse Terminology

• Dimension
• Attribute
• Facts
• Measures

25
Dimension

• A collection of reference information about a measurable event (fact)


• Categorizes and describe data in order to support meaningful answers to business questions
• Commonly used dimensions
˗ Item (Product)
˗ Store (Organization)
˗ Day (Time)
˗ Supplier
˗ Customer

26
Dimension

• Product
• Company-Division-Group-Department-Class-Subclass-Item
• UDA, Item List, Item Differentiators, Packs

• Organization
• Company-Chain-Area-Region-District-Location
• Channel, Location Traits, Physical/Virtual Warehouse

• Time (Fiscal Calendar)

• Supplier
• Supplier Country, Trait, Currency, Primary Supplier

• Promotion
• Promotion Head, Scheme, Scheme Type, Start and End Dates

27
Product

Product Dimension
• The Product dimension represents the product lines that the company sells.
• Generate Information:
• Item with highest profit,
• Item performance

28
Organization

Store Dimension
• Store Dimension contains store information like city, store name and phone number of the store
where the product was sold.
• Used to track store-specific metrics, such as customer traffic.

29
Time/Calendar

Time Dimension
• Keep track of the time of occurrence of different events in the Retail Organization.
• Analyze the trends over the time

30
4-5-4 Calendar

• Each Quarter has 3 months which contains respective number of weeks.

• Every quarter contains 13 full weeks.

• There can be variation in this calendar as


4-4-5, 4-5-4, 4-4-5
Where the numbers are the number of weeks in 1st,2nd,3rd month of quarter respectively.

31
13 Period Calendar

• There is 13 Period in a Year.


• Each Period contains 4 complete week (which means 28 days month)
• Mostly 13 Period Calendar system is implemented as Alternative Calendar not as main Calendar

32
Attributes

Properties that describe an entity

• SIZE (S, M, L, XL,XXL)


• COLOR (black,white,red,blue)
• FABRIC (“100% cotton”, “57% Cotton, 43% Tencel”, “rayon”)
• NECK DESIGN (v-neck, round neck, hi-neck)
• SLEEVE LENGTH (long sleeve, baseball ¾ sleeve, half
sleeve)

33
Business Vs Surrogate Key

• Record
• Business Key: It refers to the identifier given to the entity when it was created in the source system. Also known
as identifiers.

• Surrogate Key: A surrogate key is a unique value used to identify an entity in the Data warehouse. A new key is
attached to an entity whenever it is inserted into a data mart dimension table.

34
Hierarchy

Company
Company
Fiscal Year
Division
Fiscal Half Year Chain
Group
Roll Up

Fiscal Quarter Area

Drill Down
Department

Fiscal Month Region


Class
District
Fiscal Week
Subclass

Day Location
Item

Time Hierarchy Product Hierarchy Organization Hierarchy

35
Fact

• A fact is typically quantifiable or measurable.


• Facts are typically numerical and can be aggregated (sum, average mean/mode/median,
max/min).
• For Example: Price , SALES_AMOUNT, QUANTITY, DISCOUNT

36
?

PO of
$20000

37
Fact

38
Facts

• Contains of the measurements or metrics of a business process


• Stores quantitative information for analysis and is often denormalized
• Facts have little meaning by themselves because they are usually just values with no qualifiers
• (for example, 20 sales units or 300 units of inventory).
• What gives fact data true meaning is the intersection of multiple dimensions associated with the fact
value.
• Dimension data serves as reference data to facts.

39
Facts

• Standard Facts
• can be summed up across associate dimension.
For example, the sum of total sales for each of the days in a week gives the total sales for that week

• Positional Facts
• The data in positional fact tables reports the state of an entity at a certain point in time, rather than the
total activity of an entity, these facts cannot be simply summed over time.
For example, Stock on Hand

40
Facts

• Additive Fact
• Eg. Sales

• Semi-Additive Fact
• Eg. Inventory Position

• Non-Additive Fact
• Eg. Price

41
Measure

• An aggregation or transformed fact or a calculated measure

• Net Sales Quantity = [Sales_QTY- Return_QTY]


• Net Sales Amount = [Sales_AMT-Return_AMT]
• Gross Sales Amount = [Sales_AMT]
• Stock On Hand = [Stock_In_Store+Stock_In_Transit]

42
Aggregate Table

• Pre-computed table that stores the result of complex queries.


• Created as per the requirements.
• Helps to increase query performance
• Example
• Total sales of store, grouped by date and month
• Total sales for each product, grouped by date and month

43
Data Modeling

• For example: There is a requirement to create


the following report which is Day-Location-Product
wise Sales Report.

• To achieve this, a model must be defined, and • Here, the primary keys of dimension tables and foreign
relationship needs to be mapped between the fact keys of fact tables relate to each other. This enables us to
use the field of respective tables together.
and dimension tables.

44
03

Understanding
OLAP
Data Redundancy

• Data redundancy results in data inconsistency


• Different and conflicting versions of the same data appear in
different places
• Errors more likely to occur when complex entries are made in
several different files and/or recur frequently in one or more
files
• Data anomalies develop when required changes in redundant
data are not made successfully
Data Normalization

• First normal form (1NF): data items are atomic


• Second normal form (2 NF): attributes fully depend on primary key,
• Third normal form: all non-key attributes are completely independent of each other.

• Normalization & Its Types with Example: Self Study in details?

47
Data Normalization

• First normal form (1NF): Rules


• Each table cell should contain a single value. (Can not contain two values in 1
field.
• Each record needs to be unique.

48
Data Normalization

• Second normal form (2 NF):


• Rule 1- Be in 1NF
• Rule 2- Single Column Primary Key
that does not functionally
dependant on any subset of
candidate key relation

49
Data Normalization

• Third normal form (3NF): all non-key


attributes are completely independent of each
other.
• Rule 1- Be in 2NF
• Rule 2- Has no transitive functional
dependencies

50
OLTP

• OLTP: On Line Transaction Processing


• Describes processing at operational sites
• Facilitate and manage transaction-oriented applications, typically for data entry and
retrieval transaction processing
• Main emphasis for OLTP systems is put on very fast query processing, maintaining data
integrity in multi-access environments

51
OLAP

• OLAP: On Line Analytical Processing


• Describes processing at warehouse
 an element of decision support systems (DSS)
• An approach to answering multidimensional queries(MDA) swiftly
• Also encompasses relational database, report writing and data mining
• Enables users to easily and selectively view data from different points of view
• is a way of making transactional data usable and understandable for decision making.

52
OLAP vs OLTP

OLTP OLAP

• Mostly updates • Mostly reads


• Many small transactions • Queries are long and complex
• Mb - Gb of data • Gb - Tb of data
• Current snapshot • History
• Index/hash • Lots of scans
• Raw data • Summarized, reconciled data
• Thousands of users (e.g., clerical users) • Hundreds of users (e.g., decision-makers, analysts)
• OLTP Systems are used to “run” a business • The OLAP helps to “optimize” the business

53
How Do Data Warehous es Differ From Operational Systems ?

• Goals
• Structure
• Size
• Performance optimization
• Technologies used

54
Comparison Chart of Databas e Types

Data warehouse Operational system


Subject oriented Transaction oriented

Large (hundreds of GB up to several TB) Small (MB up to several GB)

Historic data Current data

De-normalized table structure (few tables, many Normalized table structure (many tables, few
columns per table) columns per table)
Batch updates Continuous updates

Usually very complex queries Simple to complex queries


Design Differences

Operational System Data Warehouse

ER Diagram Star Schema

56
Data Warehous es , Data Marts, and Operational Data Stores

• Data Warehouse – The queryable source of data in the enterprise. It is comprised of the
union of all of its constituent data marts.
• Data Mart – A logical subset of the complete data warehouse. Often viewed as a
restriction of the data warehouse to a single business process or to a group of related
business processes targeted toward a particular business group.

SOURCE: Ralph Kimball


57
Data Warehous es , Data Marts, and Operational Data Stores

• Operational Data Store (ODS) – A point of integration for operational systems that
developed independent of each other. Since an ODS supports day to day operations, it
needs to be continually updated.

58
04

Approaches to Data
Warehous ing Models
Approaches to Data Warehous ing

Kimball Model:
Approaches to Data Warehous ing

Inmon Model:
Approaches to Data Warehous ing

Kimball Data warehous e lifecycle:


05

Data analys is , extraction,


trans formation and loading
(ETL)
Key Terms
• Record Open Date
• Effective start date of dimension record within the data warehouse system. This date is
same as Effective Date from Source, if available.
• Record Close Date
• Effective end date of dimension record within the data warehouse system. This date is
same as Record close date in source system, if available.
• Record Close Flag (CURRENT_FLG)
• This flag indicates if the record is current within the data warehouse or not. Valid values
include 'Y' or 'N'
• Insert Timestamp
• This is the date on which the dimension record was first inserted into the data warehouse
system.
• Update Timestamp
• This is last date when the dimension record was updated within the data warehouse
system.

64
Key Terms

• Rollup Table
• Dimension Identity/Fact Identities
• Dimension Keys

65
Flow to Data Warehouse

66
ETL Extraction Transformation Load

• To get data out of the source and load it into the data warehouse – simply a process of copying data
from one database to other

• Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded
into the data warehouse database

• Many data warehouses also incorporate data from non-OLTP systems such as text files, legacy systems,
and spreadsheets; such data also requires extraction, transformation, and loading

67
Data Flow in Warehouse

Source
Source
Tables Source Temp
Tables
(Batch)
Flat files

Warehouse Warehouse Staging


Tables Temp Tables Tables
(Batch)

68
Flow to Data Warehouse

• Source Dependent Extract (SDE): In this phase, SDE tasks extract data from the source system and
stage it in staging tables. SDE tasks are source specific.
• Source Independent Load (SIL): Load tasks transform and port the data from staging tables to base
fact or dimension tables. SIL tasks are source independent.
• Post Load Proces s (PLP). PLP tasks are only executed after the dimension and fact tables are populated.
A typical usage of a PLP task is to transform data from a base fact table and load it into an aggregate
table. PLP tasks are source independent

69
Delta vs Full snapshot

• Delta
 Extracts rows with data changes since last extract
 less number of rows change compared to the total number of rows in a table
 Records for deleted rows in source tables are maintained in a separate table
• Full Snapshot
 Extracts all the rows from required source tables irrespective of changes since
the last extract
 higher the changes in rows compared to all rows in a table
 Target tables are either truncated and loaded with latest source full snapshot
(generally for Dimension)
 or rows matching business keys between source and target tables are updated
per latest source system full snapshot and those that do not exist anymore in the
source extract are considered deleted from source and necessary delete processing are
applied in such target tables
 No need of separate tables for deleted rows
 Time consuming if changes are higher
70
Flow to Data Warehous e

• Factors to determine best method


• volume of data that client has
• how often changes are made to these data
• Eg: Product dimension can be either loaded with delta or full snapshot.
• For daily batch run, its better to load with delta for product dimension as the dimension change
is very low compared to total records.
• LAST_UPDATE_DATETIME>SYSDATE -2 and STATUS='A' from source table
• In full snapshot loading of product dimension, all records from source system are updated in
to the target table
• Full snapshot of product dimension is performed during initial seeding before the go live.
• Organisation dimension are processed with full snapshot load as the organisation records are less
in comparision to product dimension

71
Terminology

• Staging
• Temporary
• Target
• Slandered/ Incremental Update and Insert
• Reclassification

72
Dimens ion Load Strategy

73
Slowly Changing Dimension

• SCD Type 1 (Minor Change)


• when the history of the changes to the dimension is not essential to be tracked

• SCD Type 2 and above (Major Change)


• when the history of the dimension change needs to be tracked
For e.g.: SCD Type 2, SCD Type 3, SCD Type 4, SCD Type 6

74
Minor Change

● Original Name: Bib-Label Lithiated Lemon-Lime Soda


● Changed to 7UP in 1936
● Just a description change.

ID Name Size Current Flag Created Date Updated Date

101201 Bib-Label Lithiated 50 ml Y 1929-06-19 1929-06-19


Lemon-Lime Soda

ID Name Size Current Flag Created Date Updated Date


101201 7 UP 50 ml Y 1929-06-19 1936-06-23

75
SCD2 Major Change

Class Shirt

Sub-Class Casual Formal

Item Linen Shirt Linen Shirt

● Item Linen Shirt has been moved from Sub-category Casual to Formal.

76
SCD Type 2 Major Change...

Before Change

ITEM TABLE

ROW_WID SCD_WID ITM_DESC SBC_WID SBC_DESC CURR_FLG EFFECTIVE_FRM_DT EFFECTIVE_TO_DT

101 1001 LINEN SHIRT 101 CASUAL Y 1/1/2024 12/31/9999

102 1002 POLO SHIRT 101 CASUAL Y 1/1/2024 12/31/9999

103 1003 T-SHIRT 101 CASUAL Y 1/1/2024 12/31/9999

104 1004 DRESS SHIRT 102 FORMAL Y 1/1/2024 12/31/9999

105 1005 TUXEDO SHIRT 102 FORMAL Y 1/1/2024 12/31/9999

77
SCD Type 2 Major Changes…

LINEN SHIRT RECLASSED FROM CASUAL TO FORMAL ON 1/3/2019

ITEM TABLE

ROW_WID SCD_WID ITM_DESC SBC_WID SBC_DESC CURR_FLG EFFECTIVE_FRM_DT EFFECTIVE_TO_DT

101 1001 LINEN SHIRT 101 CASUAL N 1/1/2024 1/2/2024

102 1002 POLO SHIRT 101 CASUAL Y 1/1/2024 12/31/9999

103 1003 T-SHIRT 101 CASUAL Y 1/1/2024 12/31/9999

104 1004 DRESS SHIRT 102 FORMAL Y 1/1/2024 12/31/9999

105 1005 TUXEDO SHIRT 102 FORMAL Y 1/1/2024 12/31/9999

106 1001 LINEN SHIRT 102 FORMAL Y 1/3/2024 12/31/9999

78
SCD Type 3 Major Changes…

Before Change

ITEM TABLE

ROW_WID SCD_WID ITM_DESC SBC_WID SBC_DESC PREV_SBC_WID PREV_SBC_DESC

101 1001 LINEN SHIRT 101 CASUAL Null Null

102 1002 POLO SHIRT 101 CASUAL Null Null

103 1003 T-SHIRT 101 CASUAL Null Null

104 1004 DRESS SHIRT 102 FORMAL Null Null

105 1005 TUXEDO SHIRT 102 FORMAL Null Null

79
SCD Type 3 Major Changes…

LINEN SHIRT RECLASSED FROM CASUAL TO FORMAL ON 1/3/2019

ITEM TABLE

ROW_WID SCD_WID ITM_DESC SBC_WID SBC_DESC PREV_SBC_WID PREV_SBC_DESC

101 1001 LINEN SHIRT 102 FORMAL 101 CASUAL

102 1002 POLO SHIRT 101 CASUAL Null Null

103 1003 T-SHIRT 101 CASUAL Null Null

104 1004 DRESS SHIRT 102 FORMAL Null Null

105 1005 TUXEDO SHIRT 102 FORMAL Null Null

80
Reporting Approach for Reclas s ification

• AS WAS
• History is associated with the dimension hierarchy that was effective at the time of the
transaction.

• AS IS
• History is restated according to the current dimension hierarchy as it exists after the
reclassification occurred.

• PIT ( Point in Time )


• When multiple reclassifications have occurred, history can be displayed under any of the
past dimension hierarchies.

81
Fact Load Strategy

82
Fact Table Types: Base and Aggregate

• A base fact table holds fact data for a given functional area at the lowest level of granularity
Example : SALES_ITM_LOC_DAY_FACT
Stores sales fact data at the item/location/day/transaction/voucher level.

• An aggregate table holds data for a given functional area which is summarized or aggregated across one
or more dimensions.
Example : SALES_SBC_LOC_DAY_AGG
Stores sales fact data at the subclass/location/day level.

83
Aggregated Tables

• Aggregations and Partitions


• Default aggregation levels:
• Item-Location-Day
• Item-Location_Week
• Subclass-Location-Day
• Subclass-Location_Week
• More aggregates can be custom built to meet specific client needs
• Any level of partitioning can be set as needed

84
Aggregated Tables

• Pre-computed table that stores the result of complex queries.


• Created as per the requirements.
• Helps to increase query performance
• Example
• Total sales of store, grouped by date and month
• Total sales for each product, grouped by date and month

85
Major Change impact in Fact

ITEM SALES TABLE


ROW_WID ITM_WID ITM_SCD_WID SLS_QTY SLS_AMT DATE
1 101 1001 10 12000 1/2/2024
2 102 1002 15 12750 1/2/2024
3 103 1003 12 9000 1/2/2024
4 105 1005 3 6000 1/2/2024

SUBCLASS SALES TABLE


ROW_WID SBC_WID SLS_QTY SLS_AMT DATE

1 101 37 33750 1/2/2024

2 102 3 6000 1/2/2024

86
Major Change impact in Fact

ITEM SALES TABLE

ROW_WID ITM_WID ITM_SCD_WID SLS_QTY SLS_AMT DATE


1 101 1001 10 12000 1/2/2024
2 102 1002 15 12750 1/2/2024
3 103 1003 12 9000 1/2/2024
4 105 1005 3 6000 1/2/2024
5 106 1001 8 9600 1/3/2024
6 103 1003 10 7500 1/3/2024
7 104 1004 4 4000 1/3/2024

87
Major Change impact in Fact

SUBCLASS SALES AS WAS TABLE


ROW_WID SBC_WID SLS_QTY SLS_AMT DATE
1 101 37 33750 1/2/2024
2 102 3 6000 1/2/2024
3 101 10 7500 1/3/2024
4 102 12 13600 1/3/2024

SUBCLASS SALES AS IS TABLE


ROW_WID SBC_WID SLS_QTY SLS_AMT DATE
1 101 37 33750 1/2/2024
2 102 3 6000 1/2/2024
3 101 -10 -12000 1/2/2024
4 102 10 12000 1/2/2024
5 101 10 7500 1/3/2024
6 102 12 13600 1/3/2024

88
Major Change impact in Fact

SUBCLASS SALES AS WAS REPORT


SUBCLASS DATE TOTAL UNIT TOTAL SALES
CASUAL 1/2/2024 37 33750
FORMAL 1/2/2024 3 6000
CASUAL 1/3/2024 10 7500
FORMAL 1/3/2024 12 13600

SUBCLASS SALES AS IS REPORT


SUBCLASS DATE TOTAL UNIT TOTAL SALES
CASUAL 1/2/2024 27 21750
FORMAL 1/2/2024 13 18000
CASUAL 1/3/2024 10 7500
FORMAL 1/3/2024 12 13600

89
Standard Fact Aggregation

• Simple aggregates that are aggregated through summation of measures across associated dimensions.
• As Was Aggregates
• Aggregates across Organization, Product or Calendar dimension using hierarchy that exists at time of
roll up.
• Item, Location and Day level -> Sub-class, Location and Day Level : subclass for the item is
determined from the relation that exist at the day of roll-up
• Corporate Aggregates
• High level aggregates for Top Level Executives.
• No Organization level information.

90
Positional Fact Aggregation

• Contains information about an entity’s position at a given point in time.

• Cannot be aggregated by simply summing up the data.

• Functions like max() , min() , first() come in use.

91
Snowflake Schema

Month ProductCategory
MonthKey CategoryKey
MonthNumb Calendar Date CategoryName
er DateKey Description
…. MonthKey …..
….
Brand
Branch Order Fact BrandKey
BranchKey DateKey CategoryKey
Name ProductKey BrandName
….. CustomerKey …..
StoreKey
Produc
Stor
Customer tProductKey
StoreKey
e
CustomerKey BrandKey SKU
BranchKey
Code …..
…..

An example of a snowflake schema to represent placing an order


business process
Rejection Handling

• Retail Analytics base fact loading program checks dimension data integrity with
dimension tables
• Any source data that violate dimension data integrity are rejected
• The rejected records are written to the respective rejection tables.
• Rejected data will not be loaded to Retail Analytics. The data must be manually cleaned up
on source table and rerun the program again

93
06

Q&A

You might also like