0% found this document useful (0 votes)

17 views18 pages

Data Mining Association Analysis

The document discusses Association Rule Mining, which aims to identify rules predicting the occurrence of items in transactions based on co-occurrence. It defines key concepts such as frequent itemsets, support, and confidence, and outlines the two-step approach for mining association rules: frequent itemset generation followed by rule generation. The document also highlights the computational challenges and strategies, including the Apriori principle, to efficiently reduce the number of candidates and transactions.

Uploaded by

yesomo9051

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views18 pages

Data Mining Association Analysis

Uploaded by

yesomo9051

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Data Mining: Association

Analysis
Association Rule Mining
• Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items
in the transaction

Market-Basket transactions
Example of Association Rules
TID Items
{Diaper}  {Beer},
1 Bread, Milk {Milk, Bread}  {Eggs,Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread}  {Milk},
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer Implication means co-occurrence,
5 Bread, Milk, Diaper, Coke not causality!
Definition: Frequent Itemset
• Itemset
– A collection of one or more items
• Example: {Milk, Bread, Diaper}
– k-itemset TID Items
• An itemset that contains k items 1 Bread, Milk
• Support count () 2 Bread, Diaper, Beer, Eggs
– Frequency of occurrence of an itemset 3 Milk, Diaper, Beer, Coke
– E.g. ({Milk, Bread,Diaper}) = 2 4 Bread, Milk, Diaper, Beer
• Support 5 Bread, Milk, Diaper, Coke
– Fraction of transactions that contain an
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset
– An itemset whose support is greater than
or equal to a minsup threshold
Definition: Association Rule
 Association Rule
TID Items
– An implication expression of the form X 
1 Bread, Milk
Y, where X and Y are itemsets
2 Bread, Diaper, Beer, Eggs
– Example:
3 Milk, Diaper, Beer, Coke
{Milk, Diaper}  {Beer}
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
 Rule Evaluation Metrics
– Support (s)
 Fraction of transactions that contain both Example:
X and Y {Milk , Diaper }  Beer
– Confidence (c)
 (Milk, Diaper, Beer) 2
 Measures how often items in Y
appear in transactions that
s   0.4
|T| 5
contain X
 (Milk, Diaper, Beer) 2
c   0.67
 (Milk, Diaper) 3
Association Rule Mining Task
• Given a set of transactions T, the goal of
association rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold

• Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds
 Computationally prohibitive!
Mining Association Rules
Example of Rules:
TID Items
1 Bread, Milk {Milk,Diaper}  {Beer} (s=0.4, c=0.67)
2 Bread, Diaper, Beer, Eggs {Milk,Beer}  {Diaper} (s=0.4, c=1.0)
{Diaper,Beer}  {Milk} (s=0.4, c=0.67)
3 Milk, Diaper, Beer, Coke
{Beer}  {Milk,Diaper} (s=0.4, c=0.67)
4 Bread, Milk, Diaper, Beer
{Diaper}  {Milk,Beer} (s=0.4, c=0.5)
5 Bread, Milk, Diaper, Coke {Milk}  {Diaper,Beer} (s=0.4, c=0.5)

Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements
Mining Association Rules
• Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup

2. Rule Generation
– Generate high confidence rules from each frequent
itemset, where each rule is a binary partitioning of a
frequent itemset

• Frequent itemset generation is still

computationally expensive
Frequent Itemset Generation
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Given d items, there
are 2d possible
ABCDE candidate itemsets
Frequent Itemset Generation
• Brute-force approach:
– Each itemset in the lattice is a candidate frequent itemset
– Count the support of each candidate by scanning the
database
Transactions List of
Candidates
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
w
– Match each transaction against every candidate
– Complexity ~ O(NMw) => Expensive since M = 2d !!!
Computational Complexity
• Given d unique items:
– Total number of itemsets = 2d
– Total number of possible association rules:

 d   d  k 
R        
d 1 d k

 k   j 
k 1 j 1

 3  2 1
d d 1

If d=6, R = 602 rules

Frequent Itemset Generation Strategies
• Reduce the number of candidates (M)
– Complete search: M=2d
– Use pruning techniques to reduce M
• Reduce the number of transactions (N)
– Reduce size of N as the size of itemset increases
– Used by DHP and vertical-based mining algorithms
• Reduce the number of comparisons (NM)
– Use efficient data structures to store the
candidates or transactions
– No need to match every candidate against every
transaction
Reducing Number of Candidates
• Apriori principle:
– If an itemset is frequent, then all of its subsets must also
be frequent

• Apriori principle holds due to the following property

of the support measure:

X , Y : ( X  Y )  s( X )  s(Y )
– Support of an itemset never exceeds the support of its
subsets
– This is known as the anti-monotone property of support
Apriori Principle
• If an itemset is frequent, then all of its subsets must also be frequent
• If an itemset is infrequent, then all of its supersets must be infrequent too
(X  Y) (¬Y  ¬X) null

frequent
A B C D E

frequent
infrequent
AB AC AD AE BC BD BE CD CE DE

infrequent ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

13
ABCDE
Illustrating Apriori Principle
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Pruned
ABCDE
supersets
Illustrating Apriori Principle
Item Count Items (1-itemsets)
Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)

If every subset is considered, Itemset Count

6C + 6C + 6C = 41 {Bread,Milk,Diaper} 3
1 2 3
With support-based pruning,
6 + 6 + 1 = 13
Apriori Algorithm
• Method:

– Let k=1
– Generate frequent itemsets of length 1
– Repeat until no new frequent itemsets are identified
• Generate length (k+1) candidate itemsets from length k
frequent itemsets
• Prune candidate itemsets containing subsets of length k that
are infrequent
• Count the support of each candidate by scanning the DB
• Eliminate candidates that are infrequent, leaving only those
that are frequent
Example
A database has five
transactions. Let the min sup
= 50% and min con f = 80%.

Solution
Step 1: Find all Frequent
Itemsets
Frequent Itemset:
{A} {B} {C} {E} {A C} {B C} {B E} {C E} {B C E}
Step 2: Generate strong association rules from the frequent itemsets

Example
A database has five
transactions. Let the min sup
= 50% and min con f = 80%.

New Microsoft Power Point Presentation
No ratings yet
New Microsoft Power Point Presentation
18 pages
Data Mining - Module2
No ratings yet
Data Mining - Module2
112 pages
Slides
No ratings yet
Slides
92 pages
Association Rules & Frequent Itemsets: The Market-Basket Problem
No ratings yet
Association Rules & Frequent Itemsets: The Market-Basket Problem
5 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
67 pages
Rule Mining
No ratings yet
Rule Mining
20 pages
Association Rule
No ratings yet
Association Rule
22 pages
DM Association
No ratings yet
DM Association
43 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
Data Mining Chapter 4 Association Analysis
No ratings yet
Data Mining Chapter 4 Association Analysis
31 pages
04 Frequent Patterns Analysis
No ratings yet
04 Frequent Patterns Analysis
37 pages
Arm PPT
No ratings yet
Arm PPT
15 pages
Association Rule
No ratings yet
Association Rule
17 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
BD25
No ratings yet
BD25
19 pages
Unit 2
No ratings yet
Unit 2
14 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
CS2202 AssociationRuleMining
No ratings yet
CS2202 AssociationRuleMining
59 pages
Chapter 5
No ratings yet
Chapter 5
37 pages
Associate Rules
No ratings yet
Associate Rules
26 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
102 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
104 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Lect 6
No ratings yet
Lect 6
74 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Unit 4 DWM by DR KSR Association - Analysis
No ratings yet
Unit 4 DWM by DR KSR Association - Analysis
68 pages
Chap5-Association Analysis
No ratings yet
Chap5-Association Analysis
102 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
DS2 Association
No ratings yet
DS2 Association
48 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
Wk. 10. Association Rule Mining (07.12.2020)
No ratings yet
Wk. 10. Association Rule Mining (07.12.2020)
92 pages
Association
No ratings yet
Association
54 pages
Datamining Lect2 Frequent
No ratings yet
Datamining Lect2 Frequent
59 pages
DM - Unit 2
No ratings yet
DM - Unit 2
49 pages
Chap5-Association Analysis
No ratings yet
Chap5-Association Analysis
29 pages
Association Rule Mining
No ratings yet
Association Rule Mining
92 pages
BITS WASE Data Mining Session 5 PDF
No ratings yet
BITS WASE Data Mining Session 5 PDF
83 pages
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
65 pages
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
04-Association Rule Mining
No ratings yet
04-Association Rule Mining
22 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
Unit 4
No ratings yet
Unit 4
72 pages
DSTBD 9-DMassrules
No ratings yet
DSTBD 9-DMassrules
98 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
42 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Data Mining in Healthcare
No ratings yet
Data Mining in Healthcare
26 pages
Unit 3
No ratings yet
Unit 3
55 pages
Educational Data Mining For Student Placement Prediction Using Machine Learning Algorithms
No ratings yet
Educational Data Mining For Student Placement Prediction Using Machine Learning Algorithms
4 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
100% (1)
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
704 pages
Nursing Informatics Chapter 47
No ratings yet
Nursing Informatics Chapter 47
20 pages
Privacy Preserving Data Mining Thesis PDF
100% (3)
Privacy Preserving Data Mining Thesis PDF
4 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Additional Exercises
No ratings yet
Additional Exercises
4 pages
MCQ On Data Mining With Answers Set-1
No ratings yet
MCQ On Data Mining With Answers Set-1
11 pages
Applications of Data Mining in The Banking Sector
No ratings yet
Applications of Data Mining in The Banking Sector
8 pages
Portfolio 3
No ratings yet
Portfolio 3
10 pages
Mcs 221 Mca New July 25 Jan 26
No ratings yet
Mcs 221 Mca New July 25 Jan 26
4 pages
Handbook of Research For Big Data 1st Edition Brojo Kishore Mishra Download
No ratings yet
Handbook of Research For Big Data 1st Edition Brojo Kishore Mishra Download
142 pages
Mumbai University 2022-23 Engineering Curriculum
No ratings yet
Mumbai University 2022-23 Engineering Curriculum
40 pages
Data Management
No ratings yet
Data Management
16 pages
Data Mining in Agriculture On Crop Price Prediction: Techniques and Applications
No ratings yet
Data Mining in Agriculture On Crop Price Prediction: Techniques and Applications
3 pages
Partition
No ratings yet
Partition
52 pages
Big Data & AI Synergy Explained
No ratings yet
Big Data & AI Synergy Explained
11 pages
Data Mining Cat
No ratings yet
Data Mining Cat
6 pages
Data Mining in Healthcare
No ratings yet
Data Mining in Healthcare
10 pages
Data Science and Analytics With Python 1st Edition Jesus Rogel-Salazar Instant Download
No ratings yet
Data Science and Analytics With Python 1st Edition Jesus Rogel-Salazar Instant Download
148 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
15 pages
Data Mining & Big Data Insights
No ratings yet
Data Mining & Big Data Insights
3 pages
Data Mining Functionalities Guide
No ratings yet
Data Mining Functionalities Guide
15 pages
Red Wine Quality Prediction Using Machine Learning Techniques
No ratings yet
Red Wine Quality Prediction Using Machine Learning Techniques
7 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
Journal of Biomedical Informatics: Rashmi Kandwal, P.K. Garg, R.D. Garg
100% (1)
Journal of Biomedical Informatics: Rashmi Kandwal, P.K. Garg, R.D. Garg
8 pages
I2ml3e Chap7
No ratings yet
I2ml3e Chap7
22 pages
Customer Churn Prediction Using Big Data Analytics
50% (2)
Customer Churn Prediction Using Big Data Analytics
41 pages

Data Mining Association Analysis

Uploaded by

Data Mining Association Analysis

Uploaded by

Data Mining: Association

• Frequent itemset generation is still

ABCD ABCE ABDE ACDE BCDE

If d=6, R = 602 rules

• Apriori principle holds due to the following property

ABCD ABCE ABDE ACDE BCDE

ABCD ABCE ABDE ACDE BCDE

If every subset is considered, Itemset Count

You might also like