0% found this document useful (0 votes)

29 views27 pages

02 - Data Mining

The document discusses several common data mining tasks including classification, clustering, association rule discovery, sequential pattern discovery, regression, and deviation detection. It provides definitions and examples of how each task can be applied to real-world problems.

Uploaded by

Dd d

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views27 pages

02 - Data Mining

Uploaded by

Dd d

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Data Mining:

Data Mining Tasks

& Data

Data Mining - Lecture 2

Data Mining Tasks

 Prediction Tasks
 Use some variables to predict unknown or future values of other
variables
 Description Tasks
 Find human-interpretable patterns that describe the data.

Common data mining tasks

 Classification [Predictive]
 Clustering [Descriptive]
 Association Rule Discovery [Descriptive]
 Sequential Pattern Discovery [Descriptive]
 Regression [Predictive]
 Deviation Detection [Predictive]

Data Mining - Lecture 2 2

Classification: Definition

 Given a collection of records (training set )

 Each record contains a set of attributes, one of the attributes is the
class.
 Find a model for class attribute as a function of
the values of other attributes.
 Goal: previously unseen records should be
assigned a class as accurately as possible.
 A test set is used to determine the accuracy of the model. Usually,
the given data set is divided into training and test sets, with training
set used to build the model and test set used to validate it.

Data Mining - Lecture 2 3

Classification Example
Refund Marital Taxable
Status Income Cheat Cheat

No Single 75K ? No
Yes Married 50K ? No
Tid Refund Marital Taxable
Status Income Cheat No Married 150K ? No

Yes Divorced 90K ? Yes

1 Yes Single 125K No
No Single 40K ? No
2 No Married 100K No
No Married 80K ? No
3 No Single 70K No 10
10

4 Yes Married 120K No

5 No Divorced 95K Yes
6 No Married 60K No Test
Set
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
Training
Learn
10 No Single 90K Yes Model
10
Set Classifier

Data Mining - Lecture 2 4

Classification: Application 1

 Direct Marketing
 Goal: Reduce cost of mailing by targeting a set of
consumers likely to buy a new cell-phone product.
 Approach:
 Use the data for a similar product introduced before.
 We know which customers decided to buy and which decided
otherwise. This {buy, don’t buy} decision forms the class
attribute.
 Collect various demographic, lifestyle, and company-interaction
related information about all such customers.
 Type of business, where they stay, how much they earn, etc.
 Use this information as input attributes to learn a classifier
model.

Data Mining - Lecture 2 5

Classification: Application 2

 Fraud Detection
 Goal: Predict fraudulent cases in credit card
transactions.
 Approach:
 Use credit card transactions and the information on its
account-holder as attributes.
 When does a customer buy, what does he buy, how often he
pays on time, etc
 Label past transactions as fraud or fair transactions. This forms
the class attribute.
 Learn a model for the class of the transactions.
 Use this model to detect fraud by observing credit card
transactions on an account.

Data Mining - Lecture 2 6

Clustering Definition

 Given a set of data points, each having a set of

attributes, and a similarity measure among
them, find clusters such that
 Data points in one cluster are more similar to one
another.
 Data points in separate clusters are less similar to one
another.
 Similarity Measures:
 Euclidean Distance if attributes are continuous.
 Other Problem-specific Measures.

Data Mining - Lecture 2 7

Clustering: Application 1

 Market Segmentation:
 Goal: subdivide a market into distinct subsets
of customers where any subset may
conceivably be selected as a market target to
be reached with a distinct marketing mix.
 Approach:
 Collect different attributes of customers based on
their geographical and lifestyle related information.
 Find clusters of similar customers.
 Measure the clustering quality by observing buying
patterns of customers in same cluster vs. those from
different clusters.

Data Mining - Lecture 2 8

Clustering: Application 2

 Document Clustering:
 Goal: To find groups of documents that are
similar to each other based on the important
terms appearing in them.
 Approach: To identify frequently occurring
terms in each document. Form a similarity
measure based on the frequencies of different
terms. Use it to cluster.
 Gain: Information Retrieval can utilize the
clusters to relate a new document or search
term to clustered documents.

Data Mining - Lecture 2 9

Association Rule Discovery: Definition
 Given a set of records each of which contain some
number of items from a given collection;
 Produce dependency rules which will predict occurrence of an
item based on occurrences of other items.

TID Items
1 Bread, Coke, Milk Rules Discovered:
2 Juice, Bread {Milk} --> {Coke}
3 Juice, Coke, Diaper, Milk {Diaper, Milk} --> {Juice}
4 Juice, Bread, Diaper, Milk
5 Coke, Diaper, Milk

Data Mining - Lecture 2 10

Association Rule Discovery: Application 1

 Marketing and Sales Promotion:

 Let the rule discovered be
{Cookies, … } --> {Potato Chips}
 Potato Chips as consequent => Can be used to
determine what should be done to boost its sales.
 Cookies in the antecedent => Can be used to see
which products would be affected if the store
discontinues selling Cookies.
 Cookies in antecedent and Potato chips in consequent
=> Can be used to see what products should be sold
with Cookies to promote sale of Potato chips!

Data Mining - Lecture 2 11

Association Rule Discovery: Application 2

 Supermarket shelf management.

 Goal: To identify items that are bought
together by sufficiently many customers.
 Approach: Process the point-of-sale data
collected with barcode scanners to find
dependencies among items.
 A classic rule --
 If a customer buys diaper and milk, then he is very
likely to buy juice.
 So, don’t be surprised if you find six-packs stacked
next to diapers!

Data Mining - Lecture 2 12

Sequential Pattern Discovery: Definition

Given is a set of objects, with each object associated with

its own timeline of events, find rules that predict strong
sequential dependencies among different events:

 In point-of-sale transaction sequences,

 Computer Bookstore:

(Intro_To_Visual_C) (C++_Primer) --> (Perl_for_dummies)

 Athletic Apparel Store:

(Shoes) (Racket, Racketball) --> (Sports_Jacket)

Data Mining - Lecture 2 13

Regression
 Predict a value of a given continuous valued
variable based on the values of other variables,
assuming a linear or nonlinear model of
dependency.
 Greatly studied in statistics, neural network
fields.
 Examples: age of a person?
 Predicting the age of a person based on
MaritalStatus, NumberOfChildren, Income,…
 E.g., If MaritalStatus=Yes, Age = 20

+4*NumberOfChildren+0.0001*Income+…
 Predicting wind velocities as a function of
temperature, humidity, and pressure.
Data Mining - Lecture 2 14
Deviation/Anomaly Detection

 Detect significant deviations

from normal behavior
 Applications:
 Credit Card Fraud Detection

Data Mining - Lecture 2 15

What is Data?
Attributes
 Collection of data objects and
their attributes.
Tid Refund Marital Taxable
 An attribute is a property or Status Income Cheat
characteristic of an object
1 Yes Single 125K No
 Examples: eye color of a
person, temperature, etc. 2 No Married 100K No

 Attribute is also known as 3 No Single 70K No

variable, field, characteristic, 4 Yes Married 120K No
or feature. 5 No Divorced 95K Yes
 A collection of attributes Objects
6 No Married 60K No
describe an object 7 Yes Divorced 220K No
 Object is also known as 8 No Single 85K Yes
record, point, case, sample,
entity, or instance. 9 No Married 75K No
10 No Single 90K Yes
10

Data Mining - Lecture 2 16

Types of data sets
 Record
 Data Matrix
 Document Data
 Transaction Data
 Multi-Relational
 Star or snowflake schema
 Graph
 World Wide Web
 Molecular Structures

 Ordered
 Sequential Data
 Spatial Data
 Temporal Data

Data Mining - Lecture 2 17

Record Data
 Data that consists of a collection of records, each
of which consists of a fixed set of attributes
Tid Refund Marital Taxable
Status Income Cheat

1 Yes Single 125K No

2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

Data Mining - Lecture 2 18

Data Matrix
 If data objects have the same fixed set of numeric
attributes, then the data objects can be thought of as
points in a multi-dimensional space, where each
dimension represents a distinct attribute

 Such data set can be represented by an m by n matrix,

where there are m rows, one for each object, and n
columns, one for each attribute

Projection Projection Distance Load Thickness

of x Load of y load

10.23 5.27 15.22 2.7 1.2

12.65 6.25 16.22 2.2 1.1

Data Mining - Lecture 2 19

Document Data
 Each document becomes a ‘term' vector,
 each term is a component (attribute) of the vector,
 the value of each component is the number of times
the corresponding term occurs in the document.

timeout

season
coach

game
score
team

ball

lost
pla

wi
n
y

Document 1 3 0 5 0 2 6 0 2 0 2

Document 2 0 7 0 2 1 0 0 3 0 0

Document 3 0 1 0 0 1 2 2 0 3 0

Data Mining - Lecture 2 20

Transaction Data
 A special type of record data, where
 each record (transaction) involves a set of items.

 For example, consider a grocery store. The set of

products purchased by a customer during one shopping
trip constitute a transaction, while the individual
products that were purchased are the items.
TID Items
1 Bread, Coke, Milk
2 Juice, Bread
3 Juice, Coke, Diaper, Milk
4 Juice, Bread, Diaper, Milk
5 Coke, Diaper, Milk

Data Mining - Lecture 2 21

Multi-Relational Data

• Attributes are objects themselves

Data Mining - Lecture 2 22

Graph Data
 Examples: Generic graph and HTML Links
<a href="papers/papers.html#bbbb">
Data Mining </a>
<li>
2 <a href="papers/papers.html#aaaa">
Graph Partitioning </a>
<li>
5 1 <a href="papers/papers.html#aaaa">
Parallel Solution of Sparse Linear System of Equations </a>
<li>
2 <a href="papers/papers.html#ffff">
N-Body Computation and Dense Linear System Solvers
5

Data Mining - Lecture 2 23

Chemical Data
 Benzene Molecule: C6H6

Data Mining - Lecture 2 24

Ordered Data
 Sequences of transactions
Items/Events

An element of
the sequence

Data Mining - Lecture 2 25

Ordered Data
 Genomic sequence data
GGTTCCGCCTTCAGCCCCGCGCC
CGCAGGGCCCGCCCCGCGCCGTC
GAGAAGGGCCCGCCTGGCGGGCG
GGGGGAGGCGGGGCCGCCCGAGC
CCAACCGAGTCCGACCAGGTGCC
CCCTCTGCTCGGCCTAGACCTGA
GCTCATTAGGCGGCAGCGGACAG
GCCAAGTAGAACACGCGAAGCGC
TGGGCTGCCTGCTGCGACCAGGG

Data Mining - Lecture 2 26

Ordered Data

 Spatio-Temporal Data

Average Monthly
Temperature of
land and ocean

Data Mining - Lecture 2 27

Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
30 pages
Datamining ch1
No ratings yet
Datamining ch1
24 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
Wk. 1. Introduction (08.10.2020)
No ratings yet
Wk. 1. Introduction (08.10.2020)
30 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
35 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
Data Mining Slide
No ratings yet
Data Mining Slide
35 pages
Data Management
No ratings yet
Data Management
36 pages
3 DM
No ratings yet
3 DM
36 pages
Datamining-Lect1 2
No ratings yet
Datamining-Lect1 2
44 pages
Data Mining
No ratings yet
Data Mining
33 pages
Data Mining
No ratings yet
Data Mining
23 pages
Data Mining: Introduction: Lecture Notes For Chapter 1
No ratings yet
Data Mining: Introduction: Lecture Notes For Chapter 1
32 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
Data Mining
No ratings yet
Data Mining
7 pages
Ch2 DTasks
No ratings yet
Ch2 DTasks
44 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
41 pages
Lect 1
No ratings yet
Lect 1
38 pages
DMlecture 1
No ratings yet
DMlecture 1
39 pages
Data Mining, Data Wharehousing and Olap
No ratings yet
Data Mining, Data Wharehousing and Olap
33 pages
Data Mining and Warehousing: - Module 1 - Introduction
No ratings yet
Data Mining and Warehousing: - Module 1 - Introduction
29 pages
DM Day2 DataUnderstanding MS S25
No ratings yet
DM Day2 DataUnderstanding MS S25
165 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Lec 1
No ratings yet
Lec 1
33 pages
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
No ratings yet
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
32 pages
2a. Basic Data Mining Techniques
No ratings yet
2a. Basic Data Mining Techniques
39 pages
Chap1 Intro
No ratings yet
Chap1 Intro
28 pages
Introduction to Data Mining Basics
No ratings yet
Introduction to Data Mining Basics
43 pages
Data Mining
No ratings yet
Data Mining
37 pages
Fakulteti I Shkencave Kompjuterike: Lënda
No ratings yet
Fakulteti I Shkencave Kompjuterike: Lënda
58 pages
Data Mining
No ratings yet
Data Mining
26 pages
DWDM Unit 1 Part 1
No ratings yet
DWDM Unit 1 Part 1
35 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
Instructor:: Doaa Adil Mohamed Altayeb
No ratings yet
Instructor:: Doaa Adil Mohamed Altayeb
34 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
Data Mining for Aspiring Analysts
No ratings yet
Data Mining for Aspiring Analysts
36 pages
L1 Intro
No ratings yet
L1 Intro
32 pages
Tum Dersler Veri Madenciligi
No ratings yet
Tum Dersler Veri Madenciligi
123 pages
Data Mining: Representation & Tasks
No ratings yet
Data Mining: Representation & Tasks
30 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
UNIT-04: Introduction To Data Mining: Data Mining Techniques KDD Process Association Rules.
No ratings yet
UNIT-04: Introduction To Data Mining: Data Mining Techniques KDD Process Association Rules.
40 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Knowledge Discovery & Data Mining
No ratings yet
Knowledge Discovery & Data Mining
30 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
001lecture - 1 Introduction-1
No ratings yet
001lecture - 1 Introduction-1
40 pages
CSE2021 - MODULE 1ppt
No ratings yet
CSE2021 - MODULE 1ppt
62 pages
DM Consolidated
100% (1)
DM Consolidated
676 pages
DM Lec1
No ratings yet
DM Lec1
40 pages
CS822 DataMining Week1
No ratings yet
CS822 DataMining Week1
97 pages
Data Mining Process Overview
No ratings yet
Data Mining Process Overview
20 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
31 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
32 pages
Lecture Notes 1.1 & 1.2
No ratings yet
Lecture Notes 1.1 & 1.2
8 pages
Introduction To Sustainable Architecture in The Delhi Sultanate - (1) - Deyani
No ratings yet
Introduction To Sustainable Architecture in The Delhi Sultanate - (1) - Deyani
10 pages
One Piece Vol 6 The Oath 1st Edition Oda Get PDF
100% (20)
One Piece Vol 6 The Oath 1st Edition Oda Get PDF
195 pages
Consolidated Underpinning Bracket Specifications
No ratings yet
Consolidated Underpinning Bracket Specifications
2 pages
Igl Bill
No ratings yet
Igl Bill
1 page
BCH 202 Water
No ratings yet
BCH 202 Water
13 pages
Duplex Stainless Steels For Fisher Valves: Bulletin 59:025
No ratings yet
Duplex Stainless Steels For Fisher Valves: Bulletin 59:025
4 pages
The Seminar of Jacques Lacan X
No ratings yet
The Seminar of Jacques Lacan X
313 pages
Vessel Calculation Sheet: Internal Pressure Design
No ratings yet
Vessel Calculation Sheet: Internal Pressure Design
7 pages
Log-Periodic Antenna Specs
No ratings yet
Log-Periodic Antenna Specs
2 pages
Zinnia
No ratings yet
Zinnia
2 pages
PHYSICS
No ratings yet
PHYSICS
1 page
Chemistry Guide 1st Year KPK
No ratings yet
Chemistry Guide 1st Year KPK
526 pages
Jadwal Training Mls Automatic Lubrication System & Mss Automatic Fire Suppression System Di Buma Site - Kideco, 17-19 Februari 2009
No ratings yet
Jadwal Training Mls Automatic Lubrication System & Mss Automatic Fire Suppression System Di Buma Site - Kideco, 17-19 Februari 2009
2 pages
Writing A Linux Driver For An Unknown Device: Ľubomír Rintel Linuxalt 2013, Brno
No ratings yet
Writing A Linux Driver For An Unknown Device: Ľubomír Rintel Linuxalt 2013, Brno
36 pages
A-Z Booklet Nature Sunshine 2014
100% (1)
A-Z Booklet Nature Sunshine 2014
126 pages
Cadastral Modernization in Zanzibar
No ratings yet
Cadastral Modernization in Zanzibar
3 pages
Food Security in India Assignment
No ratings yet
Food Security in India Assignment
4 pages
Chapter 2 Utilization of Banana Peel As Skin Moisturizer
50% (2)
Chapter 2 Utilization of Banana Peel As Skin Moisturizer
12 pages
ECE Student Academic Records
No ratings yet
ECE Student Academic Records
5 pages
Sans 5845
No ratings yet
Sans 5845
7 pages
Locate365 Master Corp Pitch
No ratings yet
Locate365 Master Corp Pitch
10 pages
Redesign of An in Market Food Processor For Manufacturing Cost Reduction Using DFMA Methodology
No ratings yet
Redesign of An in Market Food Processor For Manufacturing Cost Reduction Using DFMA Methodology
20 pages
Activity Sheet Understanding Typhoons
No ratings yet
Activity Sheet Understanding Typhoons
5 pages
Thrombocytopenia, With or Without Other Manifestations Related To Gaucher Disease: A Key Diagnostic Clue in Gaucher Disease
No ratings yet
Thrombocytopenia, With or Without Other Manifestations Related To Gaucher Disease: A Key Diagnostic Clue in Gaucher Disease
6 pages
Wardlaw's Contemporary Nutrition: A Functional Approach 6th Edition Anne M. Smith Download
0% (2)
Wardlaw's Contemporary Nutrition: A Functional Approach 6th Edition Anne M. Smith Download
138 pages
Sultana's Dream by Rokeya Begum
No ratings yet
Sultana's Dream by Rokeya Begum
13 pages
BGS Class Four (Chapter 01)
100% (5)
BGS Class Four (Chapter 01)
4 pages
Copeladder: Cable Ladder System For Power, Control, Instrumentation Cable & Pneumatic Tubing
No ratings yet
Copeladder: Cable Ladder System For Power, Control, Instrumentation Cable & Pneumatic Tubing
42 pages
Year 12 IAL Biology Week 1
No ratings yet
Year 12 IAL Biology Week 1
34 pages
Ductility Aspects of Reinforced and Non-Reinforced Timber Joints PDF
No ratings yet
Ductility Aspects of Reinforced and Non-Reinforced Timber Joints PDF
9 pages

02 - Data Mining

Uploaded by

02 - Data Mining

Uploaded by

Data Mining:

Data Mining Tasks

Data Mining - Lecture 2

Common data mining tasks

Data Mining - Lecture 2 2

 Given a collection of records (training set )

Data Mining - Lecture 2 3

Yes Divorced 90K ? Yes

4 Yes Married 120K No

Data Mining - Lecture 2 4

Data Mining - Lecture 2 5

Data Mining - Lecture 2 6

 Given a set of data points, each having a set of

Data Mining - Lecture 2 7

Data Mining - Lecture 2 8

Data Mining - Lecture 2 9

Data Mining - Lecture 2 10

 Marketing and Sales Promotion:

Data Mining - Lecture 2 11

 Supermarket shelf management.

Data Mining - Lecture 2 12

Given is a set of objects, with each object associated with

 In point-of-sale transaction sequences,

(Intro_To_Visual_C) (C++_Primer) --> (Perl_for_dummies)

 Athletic Apparel Store:

Data Mining - Lecture 2 13

 Detect significant deviations

Data Mining - Lecture 2 15

 Attribute is also known as 3 No Single 70K No

Data Mining - Lecture 2 16

Data Mining - Lecture 2 17

1 Yes Single 125K No

Data Mining - Lecture 2 18

 Such data set can be represented by an m by n matrix,

Projection Projection Distance Load Thickness

10.23 5.27 15.22 2.7 1.2

Data Mining - Lecture 2 19

Data Mining - Lecture 2 20

 For example, consider a grocery store. The set of

Data Mining - Lecture 2 21

• Attributes are objects themselves

Data Mining - Lecture 2 22

Data Mining - Lecture 2 23

Data Mining - Lecture 2 24

Data Mining - Lecture 2 25

Data Mining - Lecture 2 26

Data Mining - Lecture 2 27

You might also like