0% found this document useful (0 votes)

30 views30 pages

Datamining 1

Uploaded by

castiron1998

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views30 pages

Datamining 1

Uploaded by

castiron1998

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Introduction

DATA MINING

1
Why Data Mining?
Necessity, who is the mother of invention. – Plato

 We are drowning in data, but starving for knowledge!

 The Explosive Growth of Data: from terabytes to

petabytes

 Major sources of abundant data

 Business: Web, e-commerce, transactions, stocks, …
 Science: Remote sensing, bioinformatics, scientific simulation, …

 Society and everyone: news, digital cameras, YouTube

2
Why Data Mining?
 Data mining turns a large collection of data into
knowledge

 A search engine (e.g., Google) receives hundreds of millions of queries

every day
 Each query can be viewed as a transaction where the user describes her
or his information need
 some patterns found in user search queries can disclose invaluable
knowledge that cannot be obtained by reading individual data items
alone

3
Data Mining

searching for knowledge (interesting patterns) in data.

4
What Is Data Mining?

 Data mining (knowledge discovery from data)

 Extraction of interesting (non-trivial, implicit, previously unknown
and potentially useful) patterns or knowledge from huge amount of
data
 Data mining: a misnomer?

 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
 Watch out: Is everything “data mining”?
 Simple search and query processing
 (Deductive) expert systems

5
Data Mining Applications

6
Data Mining for Financial Data Analysis

 Design and construction of data warehouses

 Loan payment prediction and customer credit
policy analysis
 Classification and clustering of customers for
targeted marketing
 Detection of money laundering and other financial
crimes

7
Knowledge Discovery (KDD) Process
 This is a view from typical database
systems and data warehousing
communities
Pattern Evaluation
 Data mining plays an essential role
in the knowledge discovery process
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
8
Knowledge Discovery (KDD) Process
 Data cleaning (to remove noise and inconsistent data)
 Data integration (where multiple data sources may be
combined)
 Data selection (where data relevant to the analysis task are
retrieved from the database)
 Data transformation (where data are transformed and
consolidated into forms appropriate for mining by performing
summary or aggregation operations)
 Data mining (an essential process where intelligent methods
are applied to extract data patterns)
 Pattern evaluation (to identify the truly interesting patterns
representing knowledge based on interestingness measures)
 Knowledge presentation (where visualization and knowledge
representation techniques are used to present mined
knowledge to users)
9
Data Warehouses
 A data warehouse is a repository of information
collected from multiple sources, stored under a unified
schema, and usually residing at a single site.
 It is usually modeled by a multidimensional data
structure, called a data cube
 In data cube, each dimension corresponds to an
attribute or a set of attributes in the schema
 each cell stores the value of some aggregate measure
such as count as an example
 A data cube provides a multidimensional view of data
and allows the pre-computation and fast access of
summarized data
10
Data Warehouses

11
Data Mining: On What Kinds of Data?

 Database-oriented data sets and applications

 Relational database, data warehouse, transactional database
 Advanced data sets and advanced applications
 Data streams and sensor data
 Time-series data, temporal data, sequence data (incl. bio-sequences)
 Structure data, graphs, social networks and multi-linked data
 Object-relational databases
 Heterogeneous databases and legacy databases
 Spatial data and spatiotemporal data
 Multimedia database
 Text databases
 The World-Wide Web

12
Data Mining Functionalities
 Data mining functionalities are used to specify the
kinds of patterns to be found in data mining tasks

 In general, such tasks can be classified into two

categories –
 Descriptive - characterizes properties of the data in a
target data set.
 Predictive - performs induction on the current data in
order to make predictions

13
Generalization

 Information integration and data warehouse construction

 Data cleaning, transformation, integration, and
multidimensional data model

 Multidimensional concept description: Characterization

and discrimination
 Generalize, summarize, and contrast data characteristics

14
Example: Data Characterization
 A customer relationship manager at
“ABCElectronics” may order the following data
mining task: Summarize the characteristics of
customers who spend more than $5000 a year at
“ABCElectronics”.
 The result is a general profile of these customers,
such as that they are 40 to 50 years old, employed,
and have excellent credit ratings.
 The data mining system should allow the customer
relationship manager to drill down on any
dimension, such as on occupation to view these
customers according to their type of employment
15
Example: Data Discrimination
 A customer relationship manager at “ABCElectronics” may want
to compare two groups of customers—those who shop for
computer products regularly (e.g., more than twice a month) and
those who rarely shop for such products (e.g., less than three
times a year)
 The resulting description provides a general comparative profile
of these customers, such as that 80% of the customers who
frequently purchase computer products are between 20 and 40
years old and have a university education

 Whereas 60% of the customers who infrequently buy such

products are either seniors or youths, and have no university
degree.
16
Mining Frequent Patterns, Association
and Correlation Analysis
 Frequent patterns or frequent item sets - patterns that
occur frequently in data.
 A frequent item set typically refers to a set of items
that often appear together in a transactional data set
 —for example, milk and bread, which are frequently bought together in
grocery stores by many customer
 What items are frequently purchased together in your Walmart?
 A frequently occurring subsequence, such as the pattern that
customers, tend to purchase first a laptop, followed by a digital
camera, and then a memory card, is a (frequent) sequential pattern
 Mining frequent patterns leads to the discovery of
interesting associations and correlations within data. 17
Association and Correlation Analysis

 Suppose that, as a marketing manager at

“ABCElectronics”, you want to know which items are
frequently purchased together
 An example of such a rule:
buys(X, “computer”) ⇒ buys(X, “software”) [support = 1%,confidence = 50%]

 A confidence, or certainty, of 50% means that if a

customer buys a computer, there is a 50% chance that
she will buy software as well
 A 1% support means that 1% of all the transactions
under analysis show that computer and software are
purchased together
18
Question

 A data mining system may find association rules as

follows: age(X, “20..29”) ∧ income(X, “40K..49K”) ⇒ buys(X,
“laptop”) [support = 2%, confidence = 60%]

 What does the above association rule indicate?

19
Answer
 The rule indicates that of all the customers under
study, 2% are 20 to 29 years old with an income of
$40,000 to $49,000 and have purchased a laptop
(computer)

 There is a 60% probability that a customer in this age

and income group will purchase a laptop.

20
Classification
 Classification and label prediction
 Construct models (functions) based on some training
examples
 Describe and distinguish classes or concepts for future
prediction
 E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
 Predict some unknown class labels
 Typical methods
 Decision trees, naïve Bayesian classification, support vector
machines, neural networks, rule-based classification, pattern-
based classification, logistic regression, …
 Typical applications: Credit card fraud detection, direct
21
Some Classification Tools

22
Classification and Regression

 Suppose as a sales manager you want to classify a large set of

items in the store, based on three kinds of responses to a sales
campaign: good response, mild response and no response.
 You want to derive a model for each of these three classes
based on the descriptive features of the items, such as price,
brand, place made, type, and category
 Suppose instead, that rather than predicting categorical
response labels for each store item, you would like to predict
the amount of revenue that each item will generate during an
upcoming sale , based on the previous sales data
 This is an example of regression

23
Cluster Analysis
 Unsupervised learning (i.e., Class label is unknown)
 Group data to form new categories (i.e., clusters), e.g., cluster
houses to find distribution patterns
 Principle: Maximizing intra-class similarity & minimizing
interclass similarity
 Many methods and applications

24
Outlier Analysis

 Outlier analysis
 Outlier: A data object that does not comply with the general behavior of
the data
 Noise or exception? ― One person’s garbage could be another person’s
treasure
 Methods: by product of clustering or regression analysis, …
 Useful in fraud detection, rare events analysis

 Example: Outlier analysis may uncover fraudulent usage of credit cards by

detecting purchases of unusually large amounts for a given account number
in comparison to regular charges incurred by the same account.

25
Technologies Used

26
Technologies Used
 Statistics

 Data mining has an inherent connection with statistics.

 It studies the collection, analysis, interpretation or

explanation, and presentation of data

 Statistical models are widely used to model data and

data classes

27
Technologies Used

 Machine Learning

 It investigates how computers can learn (or improve

their performance) based on data

 For example, a typical machine learning problem is to

program a computer so that it can automatically
recognize handwritten postal codes on mail after
learning from a set of examples

28
Technologies Used

 Information Retrieval
 It is the science of searching for documents or
information in documents

 Documents can be text or multimedia, and may

reside on the Web

29
Major Issues
 Mining various and new kinds of knowledge

 Mining knowledge in multidimensional space

 Data mining—an interdisciplinary effort

 Handling uncertainty, noise, or incompleteness of

data

 Pattern evaluation and pattern- or constraint-

guided mining 30

Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
Data Mining (Introduction)
No ratings yet
Data Mining (Introduction)
31 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
69 pages
Data Mining Concepts and Applications
No ratings yet
Data Mining Concepts and Applications
38 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
39 pages
Chap 1
No ratings yet
Chap 1
45 pages
Data Mining Notes
100% (1)
Data Mining Notes
45 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
CH 2
No ratings yet
CH 2
37 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
46 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
52 pages
Data Mining: Knowledge Discovery in Databases
No ratings yet
Data Mining: Knowledge Discovery in Databases
21 pages
DWDM
No ratings yet
DWDM
30 pages
Data Mining:: Dr. Hany Saleeb
No ratings yet
Data Mining:: Dr. Hany Saleeb
37 pages
Data Miningppt378
No ratings yet
Data Miningppt378
31 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
43 pages
Module 3
No ratings yet
Module 3
187 pages
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
No ratings yet
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
36 pages
Unit 1
No ratings yet
Unit 1
59 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
Intro Data Mining
100% (1)
Intro Data Mining
87 pages
INTRODUCTION Data Mining
No ratings yet
INTRODUCTION Data Mining
43 pages
Introduction
No ratings yet
Introduction
46 pages
Introduction to Data Mining Basics
No ratings yet
Introduction to Data Mining Basics
43 pages
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
No ratings yet
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
14 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Data Mining: Business Intelligence
No ratings yet
Data Mining: Business Intelligence
68 pages
1 Intro
No ratings yet
1 Intro
33 pages
Data Mining
No ratings yet
Data Mining
88 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
Lecture 1.1.1 1.1.2
No ratings yet
Lecture 1.1.1 1.1.2
32 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
UNIT-3 DATA MINING - Part1
No ratings yet
UNIT-3 DATA MINING - Part1
111 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
Comprehensive Guide to Data Mining
No ratings yet
Comprehensive Guide to Data Mining
32 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Week 4 - Introduction To Data Mining and Data Mining Techniques
No ratings yet
Week 4 - Introduction To Data Mining and Data Mining Techniques
44 pages
02-Introduction To Data Mining
No ratings yet
02-Introduction To Data Mining
40 pages
Unit-1 A
No ratings yet
Unit-1 A
47 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
32 pages
Data Mining: Concepts & Techniques
No ratings yet
Data Mining: Concepts & Techniques
29 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
DM 1
No ratings yet
DM 1
47 pages
Data Mining, Data Pattern, Machine Learning (Week 2
No ratings yet
Data Mining, Data Pattern, Machine Learning (Week 2
19 pages
Data Mining: by P.Tejesh Reddy
No ratings yet
Data Mining: by P.Tejesh Reddy
28 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
Data Mining Concepts & Techniques Guide
100% (2)
Data Mining Concepts & Techniques Guide
27 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Educational Planning: ED 230 - 7:00 - 1:00PM
100% (2)
Educational Planning: ED 230 - 7:00 - 1:00PM
28 pages
CETUG AAU 2024 Format Rules Guidelines
No ratings yet
CETUG AAU 2024 Format Rules Guidelines
3 pages
Show File
No ratings yet
Show File
66 pages
2025-06-14 State Commmittee Minutes DRAFT
No ratings yet
2025-06-14 State Commmittee Minutes DRAFT
10 pages
The Significance of Structure, Culture and Agency in Supporting and Developing Student Learning at South African Universities Chrissie Boughey
No ratings yet
The Significance of Structure, Culture and Agency in Supporting and Developing Student Learning at South African Universities Chrissie Boughey
30 pages
Observation in "The Invisible Japanese Gentlemen"
No ratings yet
Observation in "The Invisible Japanese Gentlemen"
2 pages
Mmpi
No ratings yet
Mmpi
47 pages
Physical Rehabilitation 7th Edition OSullivan Solution Manual Full Download
100% (1)
Physical Rehabilitation 7th Edition OSullivan Solution Manual Full Download
404 pages
3i's Inquiries, Investigation and Immersion
No ratings yet
3i's Inquiries, Investigation and Immersion
28 pages
Claudio Monteverdi...
100% (4)
Claudio Monteverdi...
74 pages
Machine Learning Student Grade Prediction
No ratings yet
Machine Learning Student Grade Prediction
14 pages
ST Dan SPPD
No ratings yet
ST Dan SPPD
47 pages
3°2da 14th April TP
No ratings yet
3°2da 14th April TP
3 pages
GEC Purposive Communication Course Pack
100% (1)
GEC Purposive Communication Course Pack
81 pages
By Charles J. Sykes: Losing The (Education) Race
No ratings yet
By Charles J. Sykes: Losing The (Education) Race
6 pages
Grade XI English: Asking & Giving Opinions Lesson Plan
No ratings yet
Grade XI English: Asking & Giving Opinions Lesson Plan
6 pages
CTSD2 UNIT-6 Searching and Sorting
No ratings yet
CTSD2 UNIT-6 Searching and Sorting
47 pages
DGSS
No ratings yet
DGSS
5 pages
RonakNagpalResume 2
No ratings yet
RonakNagpalResume 2
1 page
2016 Annual Implementation Plan (Aip) Planning and Research Unit (Pru)
No ratings yet
2016 Annual Implementation Plan (Aip) Planning and Research Unit (Pru)
31 pages
IKEA Case Discussion
No ratings yet
IKEA Case Discussion
3 pages
Mobile Tech in Rural Education
No ratings yet
Mobile Tech in Rural Education
9 pages
Principles of Management (103) M.B.A.
No ratings yet
Principles of Management (103) M.B.A.
101 pages
Balmes The Art of Thinking Well
100% (2)
Balmes The Art of Thinking Well
392 pages
Renal Biopsy Course for Nephrologists
No ratings yet
Renal Biopsy Course for Nephrologists
8 pages
English Profiency in Colleges Mainly URS in The Philippines
No ratings yet
English Profiency in Colleges Mainly URS in The Philippines
49 pages
Nudity, Naturism, and Western Thought
No ratings yet
Nudity, Naturism, and Western Thought
26 pages
Background of The Study
No ratings yet
Background of The Study
9 pages
January 2014 MS - Paper 2B Edexcel Biology IGCSE
No ratings yet
January 2014 MS - Paper 2B Edexcel Biology IGCSE
12 pages
DeepHipp Accurate Segmentation of Hippocampus Usin
No ratings yet
DeepHipp Accurate Segmentation of Hippocampus Usin
16 pages

Datamining 1

Uploaded by

Datamining 1

Uploaded by

Introduction

 We are drowning in data, but starving for knowledge!

 The Explosive Growth of Data: from terabytes to

 Major sources of abundant data

 Society and everyone: news, digital cameras, YouTube

 A search engine (e.g., Google) receives hundreds of millions of queries

searching for knowledge (interesting patterns) in data.

 Data mining (knowledge discovery from data)

 Design and construction of data warehouses

Data Warehouse Selection

 Database-oriented data sets and applications

 In general, such tasks can be classified into two

 Information integration and data warehouse construction

 Multidimensional concept description: Characterization

 Whereas 60% of the customers who infrequently buy such

 Suppose that, as a marketing manager at

 A confidence, or certainty, of 50% means that if a

 A data mining system may find association rules as

 What does the above association rule indicate?

 There is a 60% probability that a customer in this age

 Suppose as a sales manager you want to classify a large set of

 Example: Outlier analysis may uncover fraudulent usage of credit cards by

 Data mining has an inherent connection with statistics.

 It studies the collection, analysis, interpretation or

 Statistical models are widely used to model data and

 It investigates how computers can learn (or improve

 For example, a typical machine learning problem is to

 Documents can be text or multimedia, and may

 Mining knowledge in multidimensional space

 Data mining—an interdisciplinary effort

 Handling uncertainty, noise, or incompleteness of

 Pattern evaluation and pattern- or constraint-

You might also like