0% found this document useful (0 votes)

41 views22 pages

Unit 2 Data Warehouse

A Data Warehouse is a separate system from DBMS that stores large amounts of data from various sources to aid in decision-making. Key challenges in building a warehouse include data gathering, schema integration, data cleansing, and update propagation. The benefits of a Data Warehouse include improved business analytics, faster queries, enhanced data quality, and the ability to analyze historical data.

Uploaded by

animestudio0707

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views22 pages

Unit 2 Data Warehouse

Uploaded by

animestudio0707

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 22

DATA WAREHOUSE

DATA WAREHOUSE: Data Warehouse is separate

from DBMS, it stores a huge amount of data, which is
typically collected from multiple heterogeneous sources
like files, DBMS, etc.
The goal is to produce statistical results that may help in
decision-making. For example, a college might want to
see quick different results, like how the placement of
CS students has improved over the last 10 years, in
terms of salaries, counts, etc.

Issues Occur while Building the Warehouse :

 When and how to gather data: In a source-driven
architecture for gathering data, the data sources
transmit new information, either continually (as
transaction processing takes place), or
periodically (nightly, for example).
In a destination-driven architecture, the data warehouse
periodically sends requests for new data to the sources.
Unless updates at the sources are replicated at the
warehouse via two phase commit, the warehouse will
DATA WAREHOUSE
never be quite up to-date with the sources. Two-phase
commit is usually far too expensive to be an option, so
data warehouses typically have slightly out-of-date data.
That, however, is usually not a problem for decision-
support systems.
 What schema to use: Data sources that have been
constructed independently are likely to have
different schemas.
In fact, they may even use different data models. Part of
the task of a warehouse is to perform schema integration,
and to convert data to the integrated schema before they
are stored.
As a result, the data stored in the warehouse are not just a
copy of the data at the sources. Instead, they can be
thought of as a materialized view of the data at the
sources.
 Data transformation and cleansing: The task of
correcting and preprocessing data is called
data cleansing. Data sources often deliver data
with numerous minor inconsistencies, which
can be corrected.
DATA WAREHOUSE
For example, names are often misspelled, and addresses
may have street, area, or city names misspelled, or postal
codes entered incorrectly.
These can be corrected to a reasonable extent by consulting
a database of street names and postal codes in each city.
The approximate matching of data required for this task is
referred to as fuzzy lookup.
 How to propagate update: Updates on relations at the
data sources must be propagated to the data
warehouse. If the relations at the data warehouse are
exactly the same as those at the data source, the
propagation is straightforward.
If they are not, the problem of propagating updates is
basically the view-maintenance problem.
 What data to summarize: The raw data generated by a
transaction-processing system may be too large to
store online.
However, we can answer many queries by maintaining just
summary data obtained by aggregation on a relation,
rather than maintaining the entire relation. For example,
DATA WAREHOUSE
instead of storing data about every sale of clothing, we
can store total sales of clothing by item name and
category.
Benefits of Data Warehouse :
 Better business analytics: Data warehouse plays an
important role in every business to store and
analysis of all the past data and records of the
company. which can further increase the
understanding or analysis of data for the company.

 Faster Queries: The data warehouse is designed to

handle large queries that’s why it runs queries faster
than the database.

 Improved data Quality: In the data warehouse the

data you gathered from different sources is being
stored and analyzed it does not interfere with or add
data by itself so your quality of data is maintained
and if you get any issue regarding data quality then
the data warehouse team will solve this.
DATA WAREHOUSE
 Historical Insight: The warehouse stores all your
historical data which contains details about the
business so that one can analyze it at any time
and extract insights from it.

MULTIDIMENSIONAL DATA MODEL

The multi-Dimensional Data Model is a method which is

used for ordering data in the database along with good
arrangement and assembling of the contents in the
database.

The Multi-Dimensional Data Model allows customers to

interrogate analytical questions associated with market or
business trends, unlike relational databases which allow
customers to access data in the form of queries.

They allow users to rapidly receive answers to the requests

which they made by creating and examining the data
comparatively fast.

Working on a Multidimensional Data Model :

On the basis of the pre-decided steps, Multidimensional
Data Model works.
DATA WAREHOUSE

The following stages should be followed by every project

for building a Multi Dimensional Data Model :

Stage 1 : Assembling data from the client : In first stage,

a Multi-Dimensional Data Model collects correct data
from the client. Mostly, software professionals provide
simplicity to the client about the range of data which can
be gained with the selected technology and collect the
complete data in detail.

Stage 2 : Grouping different segments of the system : In

the second stage, the Multi Dimensional Data Model
recognizes and classifies all the data to the respective
section they belong to and also builds it problem-free to
apply step by step.

Stage 3 : Noticing the different proportions : In the third

stage, it is the basis on which the design of the system is
based. In this stage, the main factors are recognized
according to the user’s point of view. These factors are
also known as “Dimensions”.
DATA WAREHOUSE
Stage 4 : Preparing the actual-time factors and their
respective qualities : In the fourth stage, the factors which
are recognized in the previous step are used further for
identifying the related qualities. These qualities are also
known as “attributes” in the database.

Stage 5 : Finding the actuality of factors which are listed

previously and their qualities : In the fifth stage, A Multi-
Dimensional Data Model separates and differentiates the
actuality from the factors which are collected by it. These
actually play a significant role in the arrangement of a
Multi-Dimensional Data Model.

Stage 6 : Building the Schema to place the data, with

respect to the information collected from the steps above :
In the sixth stage, on the basis of the data which was
collected previously, a Schema is built.

DATA CLAENING :

Data cleaning is the process of correcting or deleting

inaccurate, damaged, improperly formatted, duplicated, or
insufficient data from a dataset.
DATA WAREHOUSE
Even if results and algorithms appear to be correct, they
are unreliable if the data is inaccurate.

There are numerous ways for data to be duplicated or

incorrectly labeled when merging multiple data sources.

In general, data cleaning lowers errors and raises the

caliber of the data. Although it might be a time-consuming
and laborious operation, fixing data mistakes and
removing incorrect information must be done.

A crucial method for cleaning up data is data mining. A

method for finding useful information in data is data
mining.

Data quality mining is a novel methodology that uses data

mining methods to find and fix data quality issues in
sizable databases.

Data mining mechanically pulls intrinsic and hidden

information from large data sets. Data cleansing can be
accomplished using a variety of data mining approaches.

Steps for Cleaning Data

DATA WAREHOUSE
You can follow these fundamental stages to clean your
data even if the techniques employed may vary depending
on the sorts of data your firm stores:

1. Remove duplicate or irrelevant observations

Remove duplicate or pointless observations as well as

undesirable observations from your dataset. The majority
of duplicate observations will occur during data gathering.

Duplicate data can be produced when you merge data

sets from several sources, scrape data, or get data from
clients or other departments.

One of the most important factors to take into account in

this procedure is de-duplication. Those observations are
deemed irrelevant when you observe observations that do
not pertain to the particular issue you are attempting to
analyze.

2. Fix structural errors

When you measure or transfer data and find odd naming

practices, typos, or wrong capitalization, such are
DATA WAREHOUSE
structural faults. Mislabelled categories or classes may
result from these inconsistencies.

For instance, "N/A" and "Not Applicable" might be

present on any given sheet, but they ought to be
analyzed under the same heading.

3. Filter unwanted outliers

There will frequently be isolated findings that, at first

glance, do not seem to fit the data you are analyzing.

Removing an outlier if you have a good reason to, such as

incorrect data entry, will improve the performance of the
data you are working with.

4. Handle missing data

Because many algorithms won't tolerate missing values,

you can't overlook missing data. There are a few options
for handling missing data. While neither is ideal, both can
be taken into account, for example:
DATA WAREHOUSE
Although you can remove observations with missing
values, doing so will result in the loss of information, so
proceed with caution.

Again, there is a chance to undermine the integrity of the

data since you can be working from assumptions rather
than actual observations when you input missing numbers
based on other observations.

To browse null values efficiently, you may need to change

the way the data is used.

DATA INTEGRATION AND TRANSFORMATION

Data Integration is a data preprocessing technique that

combines data from multiple heterogeneous data sources
into a coherent data store and provides a unified view of
the data. These sources may include multiple data cubes,
databases, or flat files.

The data integration approaches are formally defined as

triple <G, S, M> where,
G stand for the global schema,
S stands for the heterogeneous source of schema,
DATA WAREHOUSE
M stands for mapping between the queries of source and
global schema.

Data integration can be challenging due to the variety of

data formats, structures, and semantics used by different
data sources.
Different data sources may use different data types,
naming conventions, and schemas, making it difficult to
combine the data into a single view.

Data integration typically involves a combination of

manual and automated processes, including data profiling,
data mapping, data transformation, and data reconciliation.

Data Integration Approaches

There are mainly two types of approaches for data
integration. These are as follows:

Tight Coupling
It is the process of using ETL (Extraction, Transformation,
and Loading) to combine data from various sources into a
single physical location.

Loose Coupling
DATA WAREHOUSE
Facts with loose coupling are most effectively kept in the
actual source databases.

This approach provides an interface that gets a query from

the user, changes it into a format that the supply database
may understand, and then sends the query to the source
databases without delay to obtain the result.

Issues in Data Integration:

There are several issues that can arise when integrating

data from multiple sources, including:

 Data Quality: Inconsistencies and errors in the data

can make it difficult to combine and analyze.
 Data Semantics: Different sources may use different
terms or definitions for the same data, making it
difficult to combine and understand the data.
 Data Heterogeneity: Different sources may use
different data formats, structures, or schemas, making
it difficult to combine and analyze the data.
 Data Privacy and Security: Protecting sensitive
information and maintaining security can be
difficult when integrating data from multiple
sources.
DATA WAREHOUSE
 Scalability: Integrating large amounts of data from
multiple sources can be computationally
expensive and time-consuming.
 Data Governance: Managing and maintaining the
integration of data from multiple sources can be
difficult, especially when it comes to ensuring
data accuracy, consistency, and timeliness.
 Performance: Integrating data from multiple
sources can also affect the performance of the
system.
 Integration with existing systems: Integrating new
data sources with existing systems can be a
complex task, requiring significant effort and
resources.
 Complexity: The complexity of integrating data
from multiple sources can be high, requiring
specialized skills and knowledge.

DATA TRANSFORMATION

Data transformation in data mining refers to the process of

converting raw data into a format that is suitable for
analysis and modeling.

The goal of data transformation is to prepare the data for

data mining so that it can be used to extract useful insights
DATA WAREHOUSE
and knowledge. Data transformation typically involves
several steps, including:

 Data cleaning: Removing or correcting errors,

inconsistencies, and missing values in the
data.
 Data integration: Combining data from multiple
sources, such as databases and spreadsheets, into a
single format.
 Data normalization: Scaling the data to a common
range of values, such as between 0 and 1, to facilitate
comparison and analysis.
 Data reduction: Reducing the dimensionality of the
data by selecting a subset of relevant features or
attributes.
 Data discretization: Converting continuous data
into discrete categories or bins.
 Data aggregation: Combining data at different
levels of granularity, such as by summing or
averaging, to create new features or attributes.

Data transformation is an important step in the data

mining process as it helps to ensure that the data is in a
DATA WAREHOUSE
format that is suitable for analysis and modeling, and that
it is free of errors and inconsistencies.

Data transformation can also help to improve the

performance of data mining algorithms, by reducing the
dimensionality of the data, and by scaling the data to a
common range of values.

The data are transformed in ways that are ideal for mining
the data. The data transformation involves steps that are:

1. Smoothing: It is a process that is used to remove noise

from the dataset using some algorithms It allows for
highlighting important features present in the dataset.

It helps in predicting the patterns. When collecting data,

it can be manipulated to eliminate or reduce any
variance or any other noise form.

2. Aggregation: Data collection or aggregation is the

method of storing and presenting data in a summary
format. The data may be obtained from multiple
data sources to integrate these data sources into a
data analysis description.
DATA WAREHOUSE
This is a crucial step since the accuracy of data analysis
insights is highly dependent on the quantity and quality
of the data used.

Gathering accurate data of high quality and a large enough

quantity is necessary to produce relevant results.

3. Discretization : It is a process of transforming

continuous data into set of small intervals. Most
Data Mining activities in the real world require
continuous attributes.

Yet many of the existing data mining frameworks are

unable to handle these attributes. Also, even if a data
mining task can manage a continuous attribute, it can
significantly improve its efficiency by replacing a
constant quality attribute with its discrete values. For
example, (1-10, 11-20) (age:- young, middle age,
senior).

4. Attribute Construction: Where new attributes are

created & applied to assist the mining process from the
given set of attributes. This simplifies the original data
& makes the mining more efficient.
DATA WAREHOUSE

5. Generalization: It converts low-level data attributes to

high-level data attributes using concept hierarchy. For
Example Age initially in Numerical form (22, 25) is
converted into categorical value (young, old).

For example, Categorical attributes, such as house

addresses, may be generalized to higher-level
definitions, such as town or country.

6. Normalization: Data normalization involves

converting all data variables into a given range.

DATA REDUCTION

Data reduction is a technique used in data mining to reduce

the size of a dataset while still preserving the most
important information.

This can be beneficial in situations where the dataset is too

large to be processed efficiently, or where the dataset
contains a large amount of irrelevant or redundant
information.
DATA WAREHOUSE
There are several different data reduction techniques that
can be used in data mining, including:

 Data Sampling: This technique involves selecting a

subset of the data to work with, rather than using the
entire dataset. This can be useful for reducing the size
of a dataset while still preserving the overall trends
and patterns in the data.
 Dimensionality Reduction: This technique involves
reducing the number of features in the dataset,
either by removing features that are not relevant or
by combining multiple features into a single feature.
 Data Compression: This technique involves using
techniques such as lossy or lossless compression to
reduce the size of a dataset.
 Data Discretization: This technique involves
converting continuous data into discrete data by
partitioning the range of possible values into intervals
or bins.
 Feature Selection: This technique involves selecting
a subset of features from the dataset that are most
relevant to the task at hand.
It’s important to note that data reduction can have a trade-
off between the accuracy and the size of the data. The
DATA WAREHOUSE
more data is reduced, the less accurate the model will be
and the less generalizable it will be.

DISCRETIZATION
Data discretization refers to a method of converting a huge
number of data values into smaller ones so that the
evaluation and management of data become easy.
In other words, data discretization is a method of
converting attributes values of continuous data into a finite
set of intervals with minimum data loss.
There are two forms of data discretization first is
supervised discretization, and the second is unsupervised
discretization.
Supervised discretization refers to a method in which the
class data is used. Unsupervised discretization refers to a
method depending upon the way which operation proceeds.
It means it works on the top-down splitting strategy
and bottom-up merging strategy.
Discretization Technique:
Discretization is one form of data transformation
technique. It transforms numeric values to interval labels
DATA WAREHOUSE
of conceptual labels. Ex. age can be transformed to (0-
10,11-20….) or to conceptual labels like youth, adult,
senior.

There are different techniques of discretization:

 Discretization by binning: It is unsupervised
method of partitioning the data based on equal
partitions , either by equal width or by equal
frequency
 Discretization by Cluster: clustering can be applied to
discretize numeric attributes. It partitions the values
into different clusters or groups by following top down
or bottom up strategy
 Discretization By decision tree: it employs top down
splitting strategy. It is a supervised technique that
uses class information.
 Discretization By correlation analysis: ChiMerge
employs a bottom-up approach by finding the best
neighboring intervals and then merging them to
form larger intervals, recursively
 Discretization by histogram: Histogram analysis is
unsupervised learning because it doesn’t use any class
DATA WAREHOUSE
information like binning. There are various partition
rules used to define histograms.
Importance of Discretization:
A discretization is important because it is useful:
 To generate concept hierarchies.
 Transform numeric data.
 To ease evaluation and management of data.
 To minimize data loss.
 To produce a better result.
 Generate a more understandable structure viz.
decision tree.

Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
135 pages
Unit - 2 Data Warehouse
No ratings yet
Unit - 2 Data Warehouse
11 pages
FoDS Notes - Unit 2
No ratings yet
FoDS Notes - Unit 2
12 pages
Data Ware House
No ratings yet
Data Ware House
203 pages
ALL YOU NEED Data - Mining - and - Warehousing
No ratings yet
ALL YOU NEED Data - Mining - and - Warehousing
42 pages
Data Mining and Data Warehouse Study Material - Edited
No ratings yet
Data Mining and Data Warehouse Study Material - Edited
7 pages
Data Warehousing and Data Mining Final Year Seminar Topic
No ratings yet
Data Warehousing and Data Mining Final Year Seminar Topic
10 pages
Data Mining and Data Warehouse BY
100% (1)
Data Mining and Data Warehouse BY
12 pages
Data Mining Display
No ratings yet
Data Mining Display
20 pages
Data Warehousing Management
No ratings yet
Data Warehousing Management
18 pages
Data Warehousing U1&2 Notes
No ratings yet
Data Warehousing U1&2 Notes
56 pages
Data Mining and Data Warehousing: Gayathri Vidya Parishad College of Engineering Visakhapatnam
No ratings yet
Data Mining and Data Warehousing: Gayathri Vidya Parishad College of Engineering Visakhapatnam
11 pages
Arun Cab
No ratings yet
Arun Cab
10 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Session10-Parts 19-20
No ratings yet
Session10-Parts 19-20
171 pages
DWDM 2 Unit Notes
No ratings yet
DWDM 2 Unit Notes
14 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
36 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
53 pages
Unit 2 - V2 - Data Science
No ratings yet
Unit 2 - V2 - Data Science
23 pages
Internship
No ratings yet
Internship
12 pages
DWDM 5 Unit Notes
No ratings yet
DWDM 5 Unit Notes
86 pages
3-Tier Data Warehouse Architecture
No ratings yet
3-Tier Data Warehouse Architecture
22 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
Data Warehousing & OLAP Essentials
No ratings yet
Data Warehousing & OLAP Essentials
65 pages
Unit - 2
No ratings yet
Unit - 2
116 pages
Data Warehouse
No ratings yet
Data Warehouse
15 pages
Unit I
No ratings yet
Unit I
18 pages
Data Mining & Warehousing Basics
No ratings yet
Data Mining & Warehousing Basics
30 pages
Unit 2
No ratings yet
Unit 2
29 pages
Data Warehouses
No ratings yet
Data Warehouses
6 pages
Data Vwarehouse
No ratings yet
Data Vwarehouse
5 pages
Data Mining v3
No ratings yet
Data Mining v3
54 pages
Ba Unit 2
No ratings yet
Ba Unit 2
20 pages
Unit 2 Data Science Good
No ratings yet
Unit 2 Data Science Good
22 pages
Third Chapter Data Warehousing
No ratings yet
Third Chapter Data Warehousing
12 pages
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
No ratings yet
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
96 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
Data Mining & Techniques Guide
No ratings yet
Data Mining & Techniques Guide
108 pages
Data Mining & KDD Overview
No ratings yet
Data Mining & KDD Overview
63 pages
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
No ratings yet
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
54 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
23AD1901-DWDM QuestionBank Student
No ratings yet
23AD1901-DWDM QuestionBank Student
25 pages
DW Olap1
No ratings yet
DW Olap1
88 pages
Data Mining Abstract
No ratings yet
Data Mining Abstract
6 pages
Unit 1
No ratings yet
Unit 1
141 pages
Module 2
No ratings yet
Module 2
43 pages
Data Mining 4
No ratings yet
Data Mining 4
59 pages
Warehouse
No ratings yet
Warehouse
60 pages
Unit - I DW
No ratings yet
Unit - I DW
12 pages
Important Questions
No ratings yet
Important Questions
26 pages
DW Data Warehousing
No ratings yet
DW Data Warehousing
56 pages
Data Mining Unit-2 Notes
No ratings yet
Data Mining Unit-2 Notes
8 pages
Chapter 1
No ratings yet
Chapter 1
11 pages
Bi - Unit Iii
No ratings yet
Bi - Unit Iii
65 pages
DATA Mining UNIT1 DATA Mining UNIT1: Operating System (Sindhi College) Operating System (Sindhi College)
No ratings yet
DATA Mining UNIT1 DATA Mining UNIT1: Operating System (Sindhi College) Operating System (Sindhi College)
24 pages
Data Warehosing and Data Mining
No ratings yet
Data Warehosing and Data Mining
15 pages
Data Notes
No ratings yet
Data Notes
37 pages
Data Mining Chapter 1 Introduction
No ratings yet
Data Mining Chapter 1 Introduction
39 pages
Unit 3
No ratings yet
Unit 3
40 pages
Mini Project
No ratings yet
Mini Project
7 pages
Presentation 2
No ratings yet
Presentation 2
10 pages
Car Project Final Report
No ratings yet
Car Project Final Report
35 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
30 pages
Toaz - Info Electronic Shop Management System PR
No ratings yet
Toaz - Info Electronic Shop Management System PR
16 pages
Project Synopsis
No ratings yet
Project Synopsis
6 pages
Edpm Sba 2
No ratings yet
Edpm Sba 2
12 pages
IGCSE O Level Computer P1 Revision Guide by Inqilab Patel PDF
No ratings yet
IGCSE O Level Computer P1 Revision Guide by Inqilab Patel PDF
137 pages
Inft
No ratings yet
Inft
22 pages
VB6555 Bullet Camera Specs
No ratings yet
VB6555 Bullet Camera Specs
3 pages
Digital Image Fundamentals1
No ratings yet
Digital Image Fundamentals1
28 pages
Datasheet Harmonic Premium SD Encoder ViBE EM2000
No ratings yet
Datasheet Harmonic Premium SD Encoder ViBE EM2000
3 pages
Notes 5
No ratings yet
Notes 5
23 pages
Huffman
No ratings yet
Huffman
11 pages
Image Compression
No ratings yet
Image Compression
42 pages
Development of Agriculture Chatbot Using Machine Learning Techniques
No ratings yet
Development of Agriculture Chatbot Using Machine Learning Techniques
5 pages
ChatGPT: A Blurry JPEG of the Web
No ratings yet
ChatGPT: A Blurry JPEG of the Web
16 pages
Ece Viii Multimedia Communication (10ec841) Notes
0% (1)
Ece Viii Multimedia Communication (10ec841) Notes
140 pages
Algorithm and Computer Science Terms
No ratings yet
Algorithm and Computer Science Terms
8 pages
TeleDelta HDS2803 HDMI/SDI Encoder
No ratings yet
TeleDelta HDS2803 HDMI/SDI Encoder
1 page
Bahria University (Karachi Campus) : Midterm Examination - Spring Semester - 2020
No ratings yet
Bahria University (Karachi Campus) : Midterm Examination - Spring Semester - 2020
11 pages
Cambridge Assessment International Education: Computer Science 2210/12 October/November 2019
No ratings yet
Cambridge Assessment International Education: Computer Science 2210/12 October/November 2019
15 pages
Integration Guide 20120614
No ratings yet
Integration Guide 20120614
42 pages
NILM PHD Thesis
No ratings yet
NILM PHD Thesis
154 pages
CSE M.Tech
No ratings yet
CSE M.Tech
24 pages
Digital Image Processing
100% (3)
Digital Image Processing
118 pages
Digital Signal Processing Course Overview
No ratings yet
Digital Signal Processing Course Overview
52 pages
8-Channel 1080N/720p WizSense DVR
No ratings yet
8-Channel 1080N/720p WizSense DVR
3 pages
Answer Explanation Related Ques
0% (1)
Answer Explanation Related Ques
30 pages
Matlab
No ratings yet
Matlab
4 pages
Image File Formats
No ratings yet
Image File Formats
13 pages
Pulse Compression Method For Radar Signal Processing
No ratings yet
Pulse Compression Method For Radar Signal Processing
5 pages
Fixed Service Trends & Future
No ratings yet
Fixed Service Trends & Future
60 pages
Popcorn 4
No ratings yet
Popcorn 4
32 pages
DWT Medical Image Processing
No ratings yet
DWT Medical Image Processing
86 pages
Dahua HDCVI DVR Users Manual V1.8.2 20160914
100% (1)
Dahua HDCVI DVR Users Manual V1.8.2 20160914
498 pages

Unit 2 Data Warehouse

Uploaded by

Unit 2 Data Warehouse

Uploaded by

DATA WAREHOUSE

DATA WAREHOUSE: Data Warehouse is separate

Issues Occur while Building the Warehouse :

 Faster Queries: The data warehouse is designed to

 Improved data Quality: In the data warehouse the

MULTIDIMENSIONAL DATA MODEL

The multi-Dimensional Data Model is a method which is

The Multi-Dimensional Data Model allows customers to

They allow users to rapidly receive answers to the requests

Working on a Multidimensional Data Model :

The following stages should be followed by every project

Stage 1 : Assembling data from the client : In first stage,

Stage 2 : Grouping different segments of the system : In

Stage 3 : Noticing the different proportions : In the third

Stage 5 : Finding the actuality of factors which are listed

Stage 6 : Building the Schema to place the data, with

Data cleaning is the process of correcting or deleting

There are numerous ways for data to be duplicated or

In general, data cleaning lowers errors and raises the

A crucial method for cleaning up data is data mining. A

Data quality mining is a novel methodology that uses data

Data mining mechanically pulls intrinsic and hidden

Steps for Cleaning Data

1. Remove duplicate or irrelevant observations

Remove duplicate or pointless observations as well as

Duplicate data can be produced when you merge data

One of the most important factors to take into account in

2. Fix structural errors

When you measure or transfer data and find odd naming

For instance, "N/A" and "Not Applicable" might be

3. Filter unwanted outliers

There will frequently be isolated findings that, at first

Removing an outlier if you have a good reason to, such as

4. Handle missing data

Because many algorithms won't tolerate missing values,

Again, there is a chance to undermine the integrity of the

To browse null values efficiently, you may need to change

DATA INTEGRATION AND TRANSFORMATION

Data Integration is a data preprocessing technique that

The data integration approaches are formally defined as

Data integration can be challenging due to the variety of

Data integration typically involves a combination of

Data Integration Approaches

This approach provides an interface that gets a query from

Issues in Data Integration:

There are several issues that can arise when integrating

 Data Quality: Inconsistencies and errors in the data

Data transformation in data mining refers to the process of

The goal of data transformation is to prepare the data for

 Data cleaning: Removing or correcting errors,

Data transformation is an important step in the data

Data transformation can also help to improve the

1. Smoothing: It is a process that is used to remove noise

It helps in predicting the patterns. When collecting data,

2. Aggregation: Data collection or aggregation is the

Gathering accurate data of high quality and a large enough

3. Discretization : It is a process of transforming

Yet many of the existing data mining frameworks are

4. Attribute Construction: Where new attributes are

5. Generalization: It converts low-level data attributes to

For example, Categorical attributes, such as house

6. Normalization: Data normalization involves

Data reduction is a technique used in data mining to reduce

This can be beneficial in situations where the dataset is too

 Data Sampling: This technique involves selecting a

There are different techniques of discretization:

You might also like