0% found this document useful (0 votes)

9 views7 pages

001.data Mining and Data Warewhouse

Data mining involves finding patterns in data sets for insights, while data warehousing centralizes data from multiple sources into a single repository. Data mining relies on the organized data from warehousing to detect meaningful patterns, with examples including fraud detection in credit cards and consumer behavior analysis in grocery stores. The document also outlines the technological infrastructure required for data mining applications, emphasizing the importance of database size and query complexity.

Uploaded by

kingsleyaddoexp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views7 pages

001.data Mining and Data Warewhouse

Uploaded by

kingsleyaddoexp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

1

What’s the difference between data

mining and data warehousing?
Data mining is the process of finding patterns in a given data set. These
patterns can often provide meaningful and insightful data to whoever is
interested in that data. Data mining is used today in a wide variety of
contexts – in fraud detection, as an aid in marketing campaigns, and
even supermarkets use it to study their consumers.

Data warehousing can be said to be the process of centralizing or

aggregating data from multiple sources into one common repository.

Example of data mining

If you’ve ever used a credit card, then you may know that credit card
companies will alert you when they think that your credit card is being
fraudulently used by someone other than you. This is a perfect example
of data mining – credit card companies have a history of your purchases
from the past and know geographically where those purchases have been
made. If all of a sudden some purchases are made in a city far from
where you live, the credit card companies are put on alert to a possible
fraud since their data mining shows that you don’t normally make
purchases in that city. Then, the credit card company can disable your
card for that transaction or just put a flag on your card for suspicious
activity.

Another interesting example of data mining is how one grocery store in

the USA used the data it collected on it’s shoppers to find patterns in
2

their shopping habits.

They found that when men bought diapers on Thursdays and Saturdays,
they also had a strong tendency to buy beer.

The grocery store could have used this valuable information to increase
their profits. One thing they could have done – odd as it sounds – is
move the beer display closer to the diapers. Or, they could have simply
made sure not to give any discounts on beer on Thursdays and
Saturdays. This is data mining in action – extracting meaningful data
from a huge data set.

Example of data warehousing – Facebook

A great example of data warehousing that everyone can relate to is what

Facebook does. Facebook basically gathers all of your data – your
friends, your likes, who you stalk, etc – and then stores that data into one
central repository. Even though Facebook most likely stores your
friends, your likes, etc, in separate databases, they do want to take the
most relevant and important information and put it into one central
aggregated database. Why would they want to do this? For many reasons
– they want to make sure that you see the most relevant ads that you’re
most likely to click on, they want to make sure that the friends that they
suggest are the most relevant to you, etc – keep in mind that this is the
data mining phase, in which meaningful data and patterns are extracted
from the aggregated data. But, underlying all these motives is the main
motive: to make more money – after all, Facebook is a business.

We can say that data warehousing is basically a process in which data

from multiple sources/databases is combined into one comprehensive
and easily accessible database. Then this data is readily available to any
business professionals, managers, etc. who need to use the data to create
3

forecasts – and who basically use the data for data mining.

Datawarehousing vs Datamining

Remember that data warehousing is a process that must occur before any
data mining can take place. In other words, data warehousing is the
process of compiling and organizing data into one common database,
and data mining is the process of extracting meaningful data from that
database. The data mining process relies on the data compiled in the
datawarehousing phase in order to detect meaningful patterns.

In the Facebook example that we gave, the data mining will typically be
done by business users who are not engineers, but who will most likely
receive assistance from engineers when they are trying to manipulate
their data. The data warehousing phase is a strictly engineering phase,
where no business users are involved. And this gives us another way of
defining the 2 terms: data mining is typically done by business users
with the assistance of engineers, and data warehousing is typically a
process done exclusively by engineers

Sr.No. Data Warehouse (OLAP) Operational Database(OLTP)

It involves historical
1 It involves day-to-day processing.
processing of information.
OLAP systems are used by
knowledge workers such as OLTP systems are used by clerks,
2
executives, managers, and DBAs, or database professionals.
analysts.
4

It is used to analyze the

3 It is used to run the business.
business.
It focuses on Information
4 It focuses on Data in.
out.
It is based on Star Schema,
It is based on Entity Relationship
5 Snowflake Schema, and Fact
Model.
Constellation Schema.
6 It focuses on Information out. It is application oriented.
7 It contains historical data. It contains current data.
It provides summarized and It provides primitive and highly
8
consolidated data. detailed data.
It provides summarized and It provides detailed and flat
9
multidimensional view of data. relational view of data.
The number of users is in The number of users is in
10
hundreds. thousands.
The number of records The number of records accessed
11
accessed is in millions. is in tens.
The database size is from The database size is from 100 MB
12
100GB to 100 PB. to 100 GB.
13 These are highly flexible. It provides high performance.

Data mining work

While large-scale information technology has been evolving separate

transaction and analytical systems, data mining provides the link
between the two. Data mining software analyzes relationships and
patterns in stored transaction data based on open-ended user queries.
Several types of analytical software are available: statistical, machine
learning, and neural networks. Generally, any of four types of
relationships are sought:

● Classes: Stored data is used to locate data in predetermined

groups. For example, a restaurant chain could mine customer
purchase data to determine when customers visit and what they
5

typically order. This information could be used to increase traffic

by having daily specials.

● Clusters: Data items are grouped according to logical relationships

or consumer preferences. For example, data can be mined to
identify market segments or consumer affinities.

● Associations: Data can be mined to identify associations. The

beer-diaper example is an example of associative mining.

● Sequential patterns: Data is mined to anticipate behavior patterns

and trends. For example, an outdoor equipment retailer could
predict the likelihood of a backpack being purchased based on a
consumer's purchase of sleeping bags and hiking shoes.

Data mining consists of five major elements:

● Extract, transform, and load transaction data onto the data

warehouse system.

● Store and manage the data in a multidimensional database system.

● Provide data access to business analysts and information

technology professionals.

● Analyze the data by application software.

● Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:

● Artificial neural networks: Non-linear predictive models that

learn through training and resemble biological neural networks in
structure.
6

● Genetic algorithms: Optimization techniques that use processes

such as genetic combination, mutation, and natural selection in a
design based on the concepts of natural evolution.

● Decision trees: Tree-shaped structures that represent sets of

decisions. These decisions generate rules for the classification of a
dataset. Specific decision tree methods include Classification and
Regression Trees (CART) and Chi Square Automatic Interaction
Detection (CHAID) . CART and CHAID are decision tree
techniques used for classification of a dataset. They provide a set
of rules that you can apply to a new (unclassified) dataset to
predict which records will have a given outcome. CART segments
a dataset by creating 2-way splits while CHAID segments using
chi square tests to create multi-way splits. CART typically requires
less data preparation than CHAID.

● Nearest neighbor method: A technique that classifies each record

in a dataset based on a combination of the classes of the k record(s)
most similar to it in a historical dataset (where k 1). Sometimes
called the k-nearest neighbor technique.

● Rule induction: The extraction of useful if-then rules from data

based on statistical significance.

● Data visualization: The visual interpretation of complex

relationships in multidimensional data. Graphics tools are used to
illustrate data relationships.

What technological infrastructure is required?

Today, data mining applications are available on all size systems for
mainframe, client/server, and PC platforms. System prices range from
several thousand dollars for the smallest applications up to $1 million a
terabyte for the largest. Enterprise-wide applications generally range in
7

size from 10 gigabytes to over 11 terabytes. NCR has the capacity to

deliver applications exceeding 100 terabytes. There are two critical
technological drivers:

● Size of the database: the more data being processed and

maintained, the more powerful the system required.

● Query complexity: the more complex the queries and the greater
the number of queries being processed, the more powerful the
system required.

Relational database storage and management technology is adequate for

many data mining applications less than 50 gigabytes. However, this
infrastructure needs to be significantly enhanced to support larger
applications. Some vendors have added extensive indexing capabilities
to improve query performance. Others use new hardware architectures
such as Massively Parallel Processors (MPP) to achieve order-of-
magnitude improvements in query time. For example, MPP systems
from NCR link hundreds of high-speed Pentium processors to achieve
performance levels exceeding those of the largest supercomputers.

Data Warehousing & Data Mining Slides
No ratings yet
Data Warehousing & Data Mining Slides
23 pages
DM 1
No ratings yet
DM 1
23 pages
1 DM Intro
No ratings yet
1 DM Intro
34 pages
Hu DM 2024
No ratings yet
Hu DM 2024
205 pages
Datamining
100% (1)
Datamining
11 pages
DM 1
No ratings yet
DM 1
22 pages
DM - Unit4
No ratings yet
DM - Unit4
15 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
13 pages
1 DM Intro1
No ratings yet
1 DM Intro1
34 pages
Data Mining and Warehousing Concepts: Hapter
No ratings yet
Data Mining and Warehousing Concepts: Hapter
7 pages
358 44 Datamining and Warehousing 4.4
No ratings yet
358 44 Datamining and Warehousing 4.4
155 pages
Data Mining: What Is Data Mining?: Correlations or Patterns Among Fields in Large Relational Databases
No ratings yet
Data Mining: What Is Data Mining?: Correlations or Patterns Among Fields in Large Relational Databases
6 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
Data Mining
No ratings yet
Data Mining
14 pages
Unit 1 - Introduction To Data Mining and Data Warehousing
No ratings yet
Unit 1 - Introduction To Data Mining and Data Warehousing
84 pages
Unit - II DW
No ratings yet
Unit - II DW
20 pages
Department of Information Technology: Data Warehousing and Data Mining IT4204 3
No ratings yet
Department of Information Technology: Data Warehousing and Data Mining IT4204 3
60 pages
5 Data Warehousing and Data Mining in Government
No ratings yet
5 Data Warehousing and Data Mining in Government
26 pages
Data Mining
No ratings yet
Data Mining
7 pages
Data Mining & Warehousing 01
No ratings yet
Data Mining & Warehousing 01
53 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Data Warehousing/Mining Comp 150 DW Chapter 1. Introduction
No ratings yet
Data Warehousing/Mining Comp 150 DW Chapter 1. Introduction
32 pages
Data Mining Abstract
No ratings yet
Data Mining Abstract
6 pages
A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in
No ratings yet
A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in
14 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
18 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Chapter One
No ratings yet
Chapter One
30 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
6 pages
Dmdw-Unit-1 R16
No ratings yet
Dmdw-Unit-1 R16
17 pages
DWDM
No ratings yet
DWDM
48 pages
Blue White Creative Cute Group Project Presentation
No ratings yet
Blue White Creative Cute Group Project Presentation
18 pages
Data Mining & KDD Overview
No ratings yet
Data Mining & KDD Overview
63 pages
Database 4
No ratings yet
Database 4
35 pages
Data Mining and Warehousing: - Data Mining Has Become A Popular Buzzword But, in Fact, Promises To
No ratings yet
Data Mining and Warehousing: - Data Mining Has Become A Popular Buzzword But, in Fact, Promises To
9 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
Data Warehousing & Mining Overview
No ratings yet
Data Warehousing & Mining Overview
55 pages
By Bi Jay Mishra
100% (1)
By Bi Jay Mishra
685 pages
Data Mining for Business Growth
No ratings yet
Data Mining for Business Growth
7 pages
Data Minng
No ratings yet
Data Minng
20 pages
Data Mining & Warehousing Basics
No ratings yet
Data Mining & Warehousing Basics
14 pages
Database Tech Evolution for Analysts
No ratings yet
Database Tech Evolution for Analysts
59 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
Unit 01
No ratings yet
Unit 01
10 pages
Datamining and Datawarehousean In-Depth Review
No ratings yet
Datamining and Datawarehousean In-Depth Review
14 pages
Data Warehosing and Data Mining
No ratings yet
Data Warehosing and Data Mining
15 pages
Group 4
No ratings yet
Group 4
16 pages
Data Mining Book
No ratings yet
Data Mining Book
96 pages
Dwdm-Unit-1 R16
No ratings yet
Dwdm-Unit-1 R16
17 pages
Data Whare House PDF
No ratings yet
Data Whare House PDF
51 pages
Lecture 1 Data Mining
No ratings yet
Lecture 1 Data Mining
51 pages
DWDM Word Docu
No ratings yet
DWDM Word Docu
14 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Internship
No ratings yet
Internship
12 pages
Data Warehousing & Mining
No ratings yet
Data Warehousing & Mining
154 pages
Data Mining and Data Warehouse - Mukesh Prasad Chaudhary
No ratings yet
Data Mining and Data Warehouse - Mukesh Prasad Chaudhary
651 pages
Data Mininng
No ratings yet
Data Mininng
11 pages
Data Warehousing & Mining Guide
No ratings yet
Data Warehousing & Mining Guide
7 pages
Keng Hua Products v. CA (Digest)
No ratings yet
Keng Hua Products v. CA (Digest)
3 pages
LTC6268 Photodiode Amplifier
No ratings yet
LTC6268 Photodiode Amplifier
7 pages
Accounts Questions Journal
No ratings yet
Accounts Questions Journal
3 pages
Sony Str-k7100 Ver1.0
No ratings yet
Sony Str-k7100 Ver1.0
74 pages
13 - NIBL - G.R. No National Marketing V NAMARCO - Digest
No ratings yet
13 - NIBL - G.R. No National Marketing V NAMARCO - Digest
3 pages
Digital Meter User Guide
100% (1)
Digital Meter User Guide
17 pages
Loan Sanction Letter
No ratings yet
Loan Sanction Letter
8 pages
MWM Service: Thinking Ahead About Partnership
No ratings yet
MWM Service: Thinking Ahead About Partnership
7 pages
Notarial Practice Rules Guide
No ratings yet
Notarial Practice Rules Guide
102 pages
Data Cables for Industrial Use
No ratings yet
Data Cables for Industrial Use
4 pages
Maruti Suzuki Brand Audit
No ratings yet
Maruti Suzuki Brand Audit
3 pages
Content Beyond Syllabus
No ratings yet
Content Beyond Syllabus
11 pages
Forex Backtesting Insights
No ratings yet
Forex Backtesting Insights
40 pages
Porter's Five Forces Analysis of Automotive Industry
No ratings yet
Porter's Five Forces Analysis of Automotive Industry
2 pages
Ug940 Vivado Tutorial Embedded Design
No ratings yet
Ug940 Vivado Tutorial Embedded Design
108 pages
Catalogue Ahu-Ba Item Number - G1069 20241016 092655242
No ratings yet
Catalogue Ahu-Ba Item Number - G1069 20241016 092655242
50 pages
Big Data Quantum
No ratings yet
Big Data Quantum
78 pages
Iptv Guide
No ratings yet
Iptv Guide
66 pages
Boiler Name
No ratings yet
Boiler Name
15 pages
Regression Analysis
No ratings yet
Regression Analysis
14 pages
Call For Applications For Admission Into ODeL Undergraduate Programmes-1
No ratings yet
Call For Applications For Admission Into ODeL Undergraduate Programmes-1
3 pages
Santaan Gopal Mantra Vidhi in Hindi and Sanskrit PDF
25% (4)
Santaan Gopal Mantra Vidhi in Hindi and Sanskrit PDF
8 pages
Chetan Complete BCT Practical File 2
No ratings yet
Chetan Complete BCT Practical File 2
22 pages
School Risk Management Guide
No ratings yet
School Risk Management Guide
25 pages
NEET Admit Card 2020
No ratings yet
NEET Admit Card 2020
4 pages
QY 21MY GE LHD Low Res
No ratings yet
QY 21MY GE LHD Low Res
9 pages
Physics Project
100% (1)
Physics Project
3 pages
Intellectual Property Law: P. Narayanan Parameswaran Narayanan Instant Download
85% (13)
Intellectual Property Law: P. Narayanan Parameswaran Narayanan Instant Download
153 pages
Is 13730 0 6 2012
No ratings yet
Is 13730 0 6 2012
21 pages
Part 3 - Key - Scripts
No ratings yet
Part 3 - Key - Scripts
3 pages

001.data Mining and Data Warewhouse

Uploaded by

001.data Mining and Data Warewhouse

Uploaded by

1

What’s the difference between data

Data warehousing can be said to be the process of centralizing or

Example of data mining

Another interesting example of data mining is how one grocery store in

their shopping habits.

Example of data warehousing – Facebook

A great example of data warehousing that everyone can relate to is what

We can say that data warehousing is basically a process in which data

Sr.No. Data Warehouse (OLAP) Operational Database(OLTP)

It is used to analyze the

Data mining work

While large-scale information technology has been evolving separate

● Classes: Stored data is used to locate data in predetermined

typically order. This information could be used to increase traffic

● Clusters: Data items are grouped according to logical relationships

● Associations: Data can be mined to identify associations. The

● Sequential patterns: Data is mined to anticipate behavior patterns

Data mining consists of five major elements:

● Extract, transform, and load transaction data onto the data

● Store and manage the data in a multidimensional database system.

● Provide data access to business analysts and information

● Analyze the data by application software.

● Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:

● Artificial neural networks: Non-linear predictive models that

● Genetic algorithms: Optimization techniques that use processes

● Decision trees: Tree-shaped structures that represent sets of

● Nearest neighbor method: A technique that classifies each record

● Rule induction: The extraction of useful if-then rules from data

● Data visualization: The visual interpretation of complex

What technological infrastructure is required?

size from 10 gigabytes to over 11 terabytes. NCR has the capacity to

● Size of the database: the more data being processed and

Relational database storage and management technology is adequate for

You might also like