0% found this document useful (0 votes)

12 views69 pages

Module 5

Uploaded by

samaymistry105

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views69 pages

Module 5

Uploaded by

samaymistry105

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Association Rule Mining and Applications

Unit 5: Association Rule Mining and Applications

1. Introduction to Data Mining with Case Studies – G. K. Gupta
2. Data Mining – Soman, Diwakar, Ajay
3. Data Mining concepts and techniques second edition by Jiawei Han
and Micheline Kamber.
4. Data Mining:” Introductory and Advancedtopics” , Pearson
Education, by M.Dunham

No Module
1 Association rule mining

2 Support and confidence

3 Frequent item sets, market basket analysis

4 Apriori algorithm

5 Incremental ARM

6 Associative classification- Rule Mining

Market Basket Analysis

▪ An analytics technique employed by retailers to understand

customer purchase behaviors. It is used to determine what
items are frequently bought together or placed in the same
basket by customers.
▪ Online retailers and publishers can use it to
– Changing the store layout according to trends
– Catalog design
– Cross marketing on online stores
– Customized promotions, emails with add-on sales etc.
– Inform the placement of content items on their media sites,
or products in their catalog
– Deliver targeted marketing
Market Basket Analysis
Other Applications

▪ Medical diagnosis
▪ Protein Sequences
▪ Fraud Detection in Credit Card Transactions
▪ Bio-Medical Literature
▪ Customer Relationship Management (CRM)
▪ Census Data
Counting Co-occurrences

▪ A market basket is a collection of items purchased by a

customer in a single customer transaction.
▪ A customer transaction consists of purchasing the items
from the store by single visit.
▪ A common goal for retailer is to
identify items that are purchased together - Frequent
Itemsets TID Items
1 One Transaction
Bread, Milk
2 Bread,
Diaper, Beer,
Eggs
3 Milk, Diaper, Beer,
Coke
4 Bread, Milk, Diaper,
Beer
Binary Representation of Transactions
5 Bread, Milk, Diaper,
Coke
Market Basket Analysis

▪ The purpose of market basket analysis is to find interesting relationships among

retail products. The results of a market basket analysis help retailers design
promotions, arrange shelf or catalog items & develop cross-marketing strategies.

▪ Association rule algorithms are used to apply a market basket analysis to a set of
data.

▪ A common goal for retailers is to identify items that are purchased together
▪ This information can be used to improve the layout of goods in a store or the
layout of catalog pages
Counting Co-occurrences

▪A market basket is a collection of items purchased by a customer in a

single customer transaction.
▪A customer transaction consists of purchasing the items from the store
by single visit, a single order through a mail-order catalog , or an order
at a store on the webs
▪A common goal for retailer is to identify items that are purchased
together.
Frequent Itemset
The Purchases Relation for Market Basket Analysis
transid custid date item qty
111 201 5/1/99 Pen 2
111 201 5/1/99 Ink 1
111 201 5/1/99 Milk 3
111 201 5/1/99 juice 6
112 105 6/3/99 Pen 1
112 105 6/3/99 Ink 1
112 105 6/3/99 Milk 1

113 106 5/10/99 Pen 1

113 106 5/10/99 Milk 1
114 201 6/1/99 Pen 2
114 201 6/1/99 Ink 2
114 201 6/1/99 juice 4
114 201 6/1/99 water 1
Frequent Itemsets : Terminology

▪A set of item is called item set.

▪The support of an item set is a fraction of transaction in the

database that contains all the item from itemset.

▪The support supp(X) of an itemset X is defined as the proportion

of transactions in the data set which contain the itemset
Frequent Itemsets

▪For e.g. {pen, ink} has 75% support in purchases.

▪We thus conclude that pen and ink are the items which are
frequently purchased together.
▪On the other hand, {milk, juice} are not purchased together
frequently.
▪User can specify the minimum support (minsup) and find all itemsets
whose support is above minsup. Such itemsets are called frequent
itemset.
▪These sets of items may be a singleton set.
▪Let’s consider the user specified minimum support as 70% then
frequent items will be {pen}, {ink}, {milk}, {pen, ink}, {pen, milk}.
Algorithm to identify frequent itemset
Algorithm to identify frequent itemset is based on a simple but
fundamental property of frequent itemsets:

 The a Priori Property: Every subset of a frequent itemset is also

a frequent itemset.

 By considering only itemsets obtained by enlarging frequent

itemsets, the number of candidate frequent itemsets are
reduced: this optimization is crucial for efficient execution
Frequent Pattern Analysis

▪ Frequent pattern - a pattern (a set of items, subsequences,

substructures, etc.) that occurs frequently in a data set
▪ Motivation - Finding inherent regularities in data

– What products are often purchased together?

Bread and butter?
– What are the subsequent purchases after buying a PC?

– Can we automatically classify web documents?

▪ Applications
– Market Basket data analysis, cross-marketing, catalog
design, sale campaign analysis, Web log (click stream)
analysis, DNA sequence analysis
Association Rule Mining

▪ Finds interesting associations and relationships among

large sets of data items
▪ Shows how frequently an itemset occurs in a transaction

▪ Association Rule Mining is one of the ways to find patterns

in data. It finds:
– features (dimensions) which occur together

– features (dimensions) which are “correlated”

▪ Initially used for Market Basket

Analysis to find how items
purchased by customers are related.
▪ Bread → Milk [sup = 5%, conf = 100%]
When to use Association Rules

▪ Association rule mining is suitable for non-numeric,

categorical data
▪We can use Association Rules in any dataset where
features take only two values i.e., 0/1. Some examples are
listed below:
– Market Basket Analysis is a popular application of
Association Rules.
– People who visit webpage X are likely to visit webpage Y

– People who have age-group [30,40] & income [>$100k]

are likely to own home
Definitions

▪ Itemset TID Items

– A set of one or more items 1 Bread, Milk

2 Bread, Diaper, Beer, Eggs
– E.g. {Milk, Bread, Diaper}
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
TID Items

Definitions 1
2
Bread, Milk
Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
▪ Support count () 4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
– Frequency of occurrence
– of an itemset (number of transactions it appears in) E.g.
({Milk, Bread, Diaper}) = 2
▪ Support
– Fraction of the transactions in which an itemset appears
– E.g. s({Milk, Bread, Diaper}) = 2/5
▪Frequent (large) Itemset
– An itemset whose support is greater than or equal to
a
minsup threshold
– User can specify the minimum support (minsup) and find all
itemsets whose support is above minsup - frequent itemset
TID Items

Association Rule 1 Bread, Milk

2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

▪ An implication expression of the form X → Y, where X and Y

are itemsets
Mining for Rules
Many algorithms have been proposed for discovering various forms of
rules that briefly describe the data.

Association Rules

An association rule has the form LHS RHS, where both LHS and RHS
are sets of items.
The interpretation of such a rule is that if every item in LHS is
purchased in a transaction, then it is likely that the items in RHS are
purchased as well.
Association Rule
By examining the set of transaction in Purchase, we can identify rules
of the form:
{ pen} {ink}
This is read as “If a pen is purchased in a transaction, it is likely that
the ink is also purchased in that transaction”.
It is the a statement that describe the transactions in the database.
There are two important measures for an association rule:
Support

Confidence
Important Measures for association rule

Support :-
 The support for a set of items is the percentage of transactions that contain all
these items.
 The support for a rule LHS RHS is the support for the set of items LHS U RHS.
 The support of the itemset {pen, ink} (in the previous example) ,is 75%
Confidence :-
 Let sup (LHS) be the percentage of transaction that contain LHS and let sup
(LHS U RHS) be the percentage of transaction that contain both LHS & RHS.
 The confidence is sup (LHS U RHS) / sup (LHS).
 The confidence of the last example is 75%.
 Confidence of a rule is an indication of the strength of the rule.
TID Items

Rule Evaluation Metric s2 1 Bread, Milk

Bread, Diaper, Beer, Eggs
3 Mi lk, Diaper , Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

▪ Measures of rule interestingness that reflect the usefulness

and certainty of discovered rules
▪ Support (s)
◦ Fraction of transactions that contain both X and Y
▪ Confidence (c)
Measures how often items in Y appear in
◦
transactions that
contain X { M ilk , D ia p e r }  B e e r

s =
 ( M i l k , D i a p e r, B e e r ) = 2 = 0 .4
|T | 5

c =
 ( M ilk , D ia p e r, B e e r ) = 2 = 0 .6 7
 ( M ilk , D ia p e r ) 3
Support

▪ Support for an itemset (X) is the percentage of transactions that

contain (X), i.e. how often it occurs in the database
▪ The support for a rule LHS → RHS is fraction of transactions
that contain both LHS & RHS.
▪ Supp (LHS → RHS) = Supp(LHS U
RHS)/ Total no. of
transactions
▪ Supp(C → A) ?

▪ Supp(C → A) = 2/5 = 40%.

Confidence

▪ Confidence of a rule is an indication of the strength of the

rule.
▪ Conf (LHS → RHS) = Supp(LHS U RHS) / sup (LHS)

▪ sup (LHS U RHS) = % of

transacation that contains both
LHS & RHS.
▪ sup (LHS) = % of transaction that contains LHS.

▪ Conf (C → A) ?

▪ Conf (C → A) =2/4 = 50%

Association Rule Mining Task

▪ Given a set of transactions T, the goal of association rule

mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold
Another example
Transac Milk Bread Butter Potato
To illustrate the concepts, tion id
we use a small example 1 1 1 0 0
from the supermarket 2 0 1 1 0
domain.
3 0 0 0 1
The set of items is
4 1 1 1 0
I = {milk,bread,butter,potato}
5 0 1 0 0
and a small database
containing the items (1
codes presence and 0
absence of an item in a
transaction)
An example rule for the supermarket could be {milk,
bread} ➔ {butter} meaning that if milk and bread is bought,
customers also buy butter.
Conf (X ➔ Y)=supp (X U Y)/ supp (X)
For itemset {milk, bread}, Support is 2 / 5 = 0.4 i.e. it occurs in 40% of
all transactions (2 out of 5 transactions).
For itemset {milk, bread, butter}, Support is 1 / 5 = 0.2
Conf {milk, bread➔ {butter} = 0.2/0.4 = 0.5
which means that for 50% of the transactions containing milk and
bread the rule is correct.
Algorithm to identify frequent itemset
The algorithm proceeds iteratively, first identifying frequent itemsets
with just one item
In each subsequent iteration, frequent itemsets identified in the previous
iteration are extended with another item to generate larger candidate
itemsets
By considering only itemsets obtained by enlarging frequent itemsets, we
greatly reduce the number of candidate frequent itemsets; this
optimization is crucial for efficient execution
The a priori property guarantees that this optimization is correct, that is,
we don't miss any frequent itemsets
A single scan of all transactions suffices to determine which candidate
itemsets generated in an iteration are frequent itemsets
The algorithm terminates when no new frequent itemsets are identfied
in an iteration
Algorithm to identify frequent itemset
First we need to do is to count the support for each item
We drop 1-itemsets that have support below the desired cut-off value to
create list of the frequent 1-itemsets
The general procedure to obtain k-itemsets from (k - 1)-itemsets for k =
2, 3, … is as follows:
Create a candidate list of k-itemsets by performing a join operation on
pairs of (k - 1)-itemsets in the list
A pair is combined only if the first (k - 2) items are the same
in both itemsets. (When k = 2. this simply means that all possible pairs
are to be combined.)
If the condition is met the join of pair is a k-itemset that contains the
common first (k - 2) items and the two items that are not in common,
one from each member of the pair
Algorithm to identify frequent itemset

All frequent k-itemset must be in this candidate list since every subset
of size (k - 1) of a frequent k-itemset must be
frequent (k - 1) itemset
k-itemsets in the candidate list which are not frequent k-itemsets, need
to deleted
To identify the, itemsets that are not frequent we examine all subsets
of size (k - 1) of each candidate k-itemset
We need to examine only (k - 1)-itemsets that contain the last two
items of the candidate k-itemset
If anyone of these subsets of size (k - 1) is not present in the frequent
(k -1) item set list, we know that the candidate k-itemset cannot be a
frequent itemset
Algorithm to identify frequent itemset
Mining Association Rules
(Apriori Algorithm)
▪ Two-step approach:
– Frequent Itemset Generation: Find all frequent
item-sets with support >= pre-determined min_support
count
– Rule Generation: List all Association Rules from
frequent item-sets. Calculate Support and Confidence for
all rules.
Prune rules that fail min_support and min_confidence
thresholds.
▪Use Apriori Algorithm
Algorithm for finding Association Rules

A user can ask for all association rules that have a specified minimum
support (minsup) & minimum confidence (minconf).
The algorithms proceed in two steps.
All frequent itemsets with the user-specified minimum support are computed.
Rules are generated using the frequent itemsets as input.

Consider frequent itemset X with support sx identified in the first step.

To generate a rule from X,divide X into two itemsets,LHS & RHS.
From the a priori property,we know that the support of LHS is larger than
minsup
The confidence of the rule LHS RHS is sx / sLHS & then check how the
ratio compares to minconf.
Algorithm for finding Association Rules
• To find the association rules from the frequent itemsets, we take a large
frequent itemset, say p, and find each nonempty subset a
• The rule a ~ (p - a) is possible if it satisfies the confidence. Confidence of
this rule is given by support(p )/support(a)
• It should be noted that when considering rules like a ~ (p - a), it is possible
to make the rule generation process more efficient as follows:
• We only want rules that have the minimum confidence required
• Since confidence is given by support(P)/support(a), it is clear that if for
some a, the rule a ~ (p - a) does not have the minimum confidence then all
rules like b ~ (p - b), where b is a subset of a, will also not have the
confidence since support(b) cannot be smaller than support(a)
• It therefore makes sense to generate all subsets a in a recursive fashion
and once we find a subset a for which the rule a ~ (p - a) does not have
minimum confidence, then no smaller subsets need to be checked
• As an example, if ABC ~ D does not have the minimum confidence then
AB ~ CD or A ~ BCD also will not have the confidence.
Algorithm for finding Association Rules

• Another way to improve rule generation is to consider rules like (p - a)

~ a. If this rule has the minimum confidence then all rules (p - b) ~ b will
also have minimum confidence if b is a subset of a since (p-b) has more
items than (p-a),given that b is smaller than a and so can not have
support higher than that of (p-a)
• e.g. if A->BC D has the minimum confidence then all rules like AB ->
CD, AC -> BD and ABC -> D will also have the minimum confidence
Apriori : Transactions

Transaction ID Items
100 Bread, Cheese, Eggs , Juice

200 Bread, Cheese, Juice

300 Bread, Milk, Yoghurt

400 Bread, Cheese, Milk

500 Cheese, Milk , Juice

Apriori : Frequent items L1

Items Frequency
Bread 4

Cheese 3
Juice 4
Milk 3
Apriori : Candidate item pairs C2

Item pairs Frequency

{Bread,cheese} 2

{Bread,juice} 3
{Bread,milk} 2
{cheese,juice} 3

{cheese,milk} 1

{juice,milk} 2
Apriori : Candidate item pairs C2
• There are two frequent item pairs which are {bread,juice} ,{cheese, juice }: L2
• From these two frequent 2-itemsets,we don’t obtain a candidate 3-itemset since
two 2-itemsets don’t have the same first item
• The two frequent 2-itemsets lead to the following possible rules :
bread -> juice
juice -> bread
cheese -> juice
juice -> cheese
transaction Itemset
1 a,b,c
2 a,c
Example 1: 3 a,d
4 b,e,f

▪ minsupport = 50%
▪ minConfidence= 50%
▪ Support count=(minsupport/100) * total no. of transactions
= (50/100)*4 = 2
items support C2
C1 Candidate item sets a, b 1
L1 large set
items support a, c 2
items support b, c 1
a 3 a 3
b 2 b 2
c 2 L2 items support
c 2
d 1 a, c 2
e 1
f 1
transaction Itemset
Example 1: 1 a,b,c
2 a,c
3 a,d
▪ Minsupport = 50% 4 b,e,f
▪ MinConfidence= 50%
items support
a, c 2

Association Support Confidence Confidence %

rule
a→c 2 =support/occurrence of “a” 66% >minconf
=2/3
=0.66
c→a 2 =support/occurrence of “c” 100% >minconf
=2/2
=1

Final Rules are a→c and c→ a.

University Question

▪ Find association rule with 50 % support and 75 % confidence

for the above data.
– Find support and confidence of the following rules:
– Laptop ➔ Card reader
– Laptop, Mobile ➔ Card reader
Trans id Items

1 Card reader, Memory card, Mobile, Laptop

2 Card reader, Mobile, Laptop
3 digi cam, Laptop, LCD TV
4 Card reader, digi cam, Laptop
5 Card reader, digi cam, Mobile
tid Itemset

UoM 1(Method 1) 1 Card reader, Memory card, Mobile, Laptop

2 Card reader, Mobile, Laptop
3 digi cam, Laptop, LCD TV
4 Card reader, digi cam, Laptop
▪ Minsupport = 50% 5 Card reader, digi cam, Mobile
▪ minConfidence= 75%
▪ Support count=(minsupport/100) * total no. of transactions = (50/100)*5 = 2.5
Ξ2 L2
C1 L1 C2
items support items support items support
items support
CR 4 CR 4 CR, M 3
CR, M 3
MC 1 M 3 CR, L 3
CR, L 3
M 3 L 4 CR, DC 2
CR, DC 2
L 4 DC 3 M, L 2
M, L 2
DC 3 L, DC 2
M, DC 1
LTV 1 C2
L, DC 2
items support L3
CR, M, L 2
items support
CR, M, DC 1
CR, M, L 2
CR, L, DC 1
M, L, DC 0
UoM tid Itemset
1(Method 1) 1 Card reader, Memory card, Mobile, Laptop
2 Card reader, Mobile, Laptop
3 digi cam, Laptop, LCD TV
4 Card reader, digi cam, Laptop
▪ Minsupport = 50% 5 Card reader, digi cam, Mobile
▪ minConfidence= 75%
▪ Support count= 2 L3

– Laptop ➔ Card reader items support

CR, M, L 2
– Laptop, Mobile ➔ Card reader

Association rule support confidence Confidence %

CR→M,L 2 2/4 50%
M→CR, L 2 2/3 66%
L→CR,M 2 2/4 50%
CR,M→L 2 2/3 66%
M,L→CR 2 2/2 100% > minconf
L,CR→M 2 2/3 66%
L→CR 3 3/4 75% = minconf
…. …. …… ……
UoM 1(Method tid Itemset

2) 1 Card reader, Memory card, Mobile, Laptop

items support items support

CR, M, L 2
tid Itemset
UoM 1(Method 2) 1 Card reader, Memory card, Mobile, Laptop
2 Card reader, Mobile, Laptop
3 digi cam, Laptop, LCD TV
4 Card reader, digi cam, Laptop
▪ Minsupport = 50% 5 Card reader, digi cam, Mobile
▪ minConfidence= 75%
▪ Support count= 3 L2
items support
– Laptop ➔ Card reader
CR, M 3
– Laptop, Mobile ➔ Card reader
CR, L 3

Association support confidence Confidence %

rule
CR→M 3 3/4 75%=minconf
CR→L 3 3/4 75%=minconf
M→CR 3 3/3 100% >minconf
L→CR 3 3/4 75%=minconf
Example 2 transaction
1
Itemset
m,o,n,k,e,y
2 d,o,n,k,e,y
3 m,a,k,e,
4 m,u,c,k,y
▪ Minsupport = 60%
5 c,o,o,k,i,e
▪ minConfidence= 80%
▪ Support count=(minsupport/100) * total no. of transactions = (60/100)*5 = 3
L2
C1 L1 C2
items support items support items support
items support
M 3 M 3 Mk 3
M,o 1
O 3 O 3 O, k 3
M,k 3
N 2 K 5 O, e 3
M,e 2
K 5 E 4 K,E 4
M,y 2
E 4 Y 3 K,Y 3
O,k 3
Y 3 O,e 3
D 1 O,y 2
A 1 K,e 4
U 1 K,y 3
C 2 E,y 2
I 1
Example 2 transaction
1
Itemset
m,o,n,k,e,y
2 d,o,n,k,e,y
3 m,a,k,e,
▪ minsupport = 60% 4 m,u,c,k,y
▪ minConfidence= 80% 5 c,o,o,k,i,e
items support L3
M,k,o, 1
items support
C3 M ke 2
O,k,e 3
Mky 2
Oke 3 C4 is not possible as only 1 frequent itemset present
Oky 2
Key 2

Association rule support confidence Confidence %

O,k→E 3 =support/occurrence of O & K 100% >minconf
=3/3 = 1
O,e → K 3 =3/3 =1 100% >minconf
K,e →o 3 =3/4=0.75 75%
E→o,k 3 =3/4=0.75 75%
K→o,e 3 =3/5=0.6 60%
O→k,e 3 =3/3=1 100%
Practice 1
Practice 1
Practice 1
Practice 2
Apriori Algorithm
▪ Note: user specifies 2 parameters: minsupport, minconf.
▪ Step 1 (Find Frequent Itemsets and Large Itemsets)

– Scan all trans. and find FI(Frequent itemset) having one

item. Those having support above minsupport - Let these be
L1.
▪ Iterations:

– Recursively generate L2, L3, etc (using previous frequent

itemsets) until FIs of all sizes are generated.
– General procedure to obtain k-itemsets from (k-1) itemsets
for k=2, 3, …. is as follows:
– Create a candidate list Ck of k-itemsets by performing a join
operation on pairs of (k-1) itemsets in the list Lk-1. A pair is
combined only if (k-2) items are same in both itemsets.
– Scan the trans. (DB) to find Lk from Ck .
Apriori Algorithm (contd)

Step 2 (Find Association Rules)

– Generate the rules by dividing Large itemsets Lk into LHS
and RHS part such that LHS ➔ RHS is valid association rule
if it meets minconf. requirement.
Practice

▪ Find association rule with 25 % support and 70 %

confidence for the following data:
Trans id Items
1 Biscuits, Bread, Cheese, Coffee, Yogurt
2 Bread, Cereal, Cheese, Coffee
3 Cheese, Chocolate, Donuts, Juice, Milk
4 Bread, Cheese, Coffee, Cereal, Juice
5 Bread, Cereal, Chocolate, Donuts, Juice
6 Milk, Tea
7 Biscuits, Bread, Cheese, Coffee, Milk
8 Egg, Milk, Tea
9 Bread, Cereal, Cheese, Chocolate, Coffee
10 Bread, Cereal, Chocolate, Donuts, Juice
Advantages and Disadvantages of
Apriori Algorithm
Associative
Classification
Classification

▪ Organize and categorize data in distinct classes (i.e. give

them class labels)
Associative Classification

▪ Association rules are generated and analyzed for use in

classification
▪ Search for strong associations between
frequent patterns and class labels
▪ Classification: Based on evaluating a set of rules in the form
of p1 ^ p2 … ^ pl → Aclass = C” (conf, sup)
▪ Steps in Associative Classification

– Generate all class association rules(CARs)

– Building a classifier using the generated CARs.

Associative Classification (AC)
Problem
▪ Given a labeled training data set, the problem is to derive a
set of class association rules (CARs) from the training data
set which satisfy certain user-constraints, i.e support and
confidence thresholds.
▪ Common Associative Algorithms:

– CBA (Classification By Association) Mine association

possible rules in the form of Cond-set (a set of attribute-
value pairs) → class label
– CPAR (Classification based on Predictive Association Rules

– CMAR (Classification based on Multiple Association Rules

– MCAR
AC Steps
Rule support and confidence for AC

▪ Given a training data set T, for a rule R : P→C

– Support of Rsup(R) - number of rows matching R
condition and having a class label C
– Confidence of Rconf(R) - number of rows matching R
condition and having class label C over the number of
objects matching R condition
– Any Item that has a support larger than the user minimum
support is called frequent itemset
Rule Generator: Basic Concepts

▪ Frequent rule items

– A rule item is frequent if its support is above minsup

▪ Accurate rule
– A rule is accurate if its confidence is above minconf

▪ The set of class association rules (CARs) consists of all the

possible rules (PRs) that are both frequent and accurate.
Classification using ARM
TID Items Gender
1 Bread, Milk F Min Support : 25%
2 Bread, Diaper, Beer, Eggs M Min Confidence : 70%

3 Milk, Diaper, Beer, Coke M

4 Bread, Milk, Diaper, Beer M

5 Bread, Milk, Diaper, Coke F

In a Classification task we want to predict the

class label (Gender) using the attributes

A good (albeit stereotypical) rule is {Beer,Diaper} → Male

whose support is 60% (3/5) and confidence is 100% (3/3)
CBA Example
CBA Example

Possible Rule Items

from Table1
•Questions
•What is frequent itemsets? What is apriory property? Describe an algorithm for finding frequent
itemsets.
•What is association rule mining? Explain the application of association rule mining with an example.
Explain Apriory algo with example.
•Find out the association rule with all possible support and confidence percent from the following sample
data:
T1 Bread, Jelly, Butter
T2 Bread, Butter
T3 Bread, Milk, Butter
T4 Juice, Bread
T5 Juice, Milk
•Consider database, D, consisting of 9 transactions. Suppose min. support count required is 2 (i.e.
min_sup=2/9=22%) and let minimum confidence required be 70%. Answer the following
•Find out FI using apriori algorithm.
•Few association rules using min support and min confidence.
T1 I1, I2, I5
T2 I2, I4
T3 I2, I3
T4 I1, I2, I4
T5 I1, I3
T6 I2, I3
T7 I1, I3
T8 I1, I2, I3, I5
T9 I1, I2, I3

Dataanalytics Unit-4
No ratings yet
Dataanalytics Unit-4
23 pages
Association
No ratings yet
Association
54 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Chapter-6 (Association Analysis Basic Concepts and Algorithms)
No ratings yet
Chapter-6 (Association Analysis Basic Concepts and Algorithms)
75 pages
Unit 5 Mining Frequent Patterns and Cluster Analysis
No ratings yet
Unit 5 Mining Frequent Patterns and Cluster Analysis
63 pages
DA Unit 4
100% (1)
DA Unit 4
125 pages
Unit-2 Dma
No ratings yet
Unit-2 Dma
68 pages
Association Rules
No ratings yet
Association Rules
39 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Unit 2
No ratings yet
Unit 2
14 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
30 pages
M5 m6 KC
No ratings yet
M5 m6 KC
36 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
108 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Association Rule Learning Guide
No ratings yet
Association Rule Learning Guide
9 pages
Lect 6
No ratings yet
Lect 6
74 pages
Data Mining
No ratings yet
Data Mining
4 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
TMK - DWDM - Unit 4. From Government Engineering College
No ratings yet
TMK - DWDM - Unit 4. From Government Engineering College
176 pages
Association RuleMining
No ratings yet
Association RuleMining
52 pages
DWDM Unit-4
No ratings yet
DWDM Unit-4
27 pages
DM Unit-III
No ratings yet
DM Unit-III
17 pages
Module 2
No ratings yet
Module 2
13 pages
CH 5
No ratings yet
CH 5
53 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
5 pages
Frequent Itemsets & Market-Basket Analysis
No ratings yet
Frequent Itemsets & Market-Basket Analysis
31 pages
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
12 pages
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Association Rules
No ratings yet
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Association Rules
9 pages
ICS 2408 - Lecture 5 - Association
No ratings yet
ICS 2408 - Lecture 5 - Association
44 pages
Data Mining and Data Analytics Unit-II
No ratings yet
Data Mining and Data Analytics Unit-II
26 pages
Data Warehouse and Data Mining - Unit 5
No ratings yet
Data Warehouse and Data Mining - Unit 5
30 pages
1.assosiation Rules
No ratings yet
1.assosiation Rules
21 pages
DM Unit Ii
No ratings yet
DM Unit Ii
30 pages
Association Rule Mining Basics
No ratings yet
Association Rule Mining Basics
17 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Market Basket Analysis Using Association Rules Unit 5
No ratings yet
Market Basket Analysis Using Association Rules Unit 5
21 pages
DMWH Unit3 Colg2
No ratings yet
DMWH Unit3 Colg2
6 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
Dokumen - Pub - Introduction To Data Mining 2nbsped 2017048641 9780133128901 0133128903 576 593
No ratings yet
Dokumen - Pub - Introduction To Data Mining 2nbsped 2017048641 9780133128901 0133128903 576 593
18 pages
Association Analysis
No ratings yet
Association Analysis
26 pages
Unit 5
No ratings yet
Unit 5
40 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Lecture Notes Session-2
No ratings yet
Lecture Notes Session-2
4 pages
Unit 4 - Association Analysis
100% (1)
Unit 4 - Association Analysis
12 pages
Data - Analytics - Chapter 3
No ratings yet
Data - Analytics - Chapter 3
54 pages
Mining Frequent, Patterns, Associations, and Correlations
No ratings yet
Mining Frequent, Patterns, Associations, and Correlations
13 pages
Data Mining for Retail Insights
No ratings yet
Data Mining for Retail Insights
12 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Unit 4 - Association Analysis
No ratings yet
Unit 4 - Association Analysis
12 pages
Araldite 40100 TDS
No ratings yet
Araldite 40100 TDS
3 pages
10th Prefabrication Modular Construction Asia Summit 2018
No ratings yet
10th Prefabrication Modular Construction Asia Summit 2018
72 pages
Caan Nepal
No ratings yet
Caan Nepal
7 pages
Moral Imagination and Systems Thinking: Patricia H. Werhane
No ratings yet
Moral Imagination and Systems Thinking: Patricia H. Werhane
11 pages
05-Hadith On Aqidah
No ratings yet
05-Hadith On Aqidah
17 pages
Speech
No ratings yet
Speech
3 pages
Bibiheybətneft Commercial Invoice - Jet A1 - Amazansi Limited Liability Company
No ratings yet
Bibiheybətneft Commercial Invoice - Jet A1 - Amazansi Limited Liability Company
5 pages
Spring 2025 - CS606 - 2
No ratings yet
Spring 2025 - CS606 - 2
3 pages
Amantra Cost & Payment Details
No ratings yet
Amantra Cost & Payment Details
1 page
Notary Application
No ratings yet
Notary Application
2 pages
2A A Writer
No ratings yet
2A A Writer
6 pages
Document Types and Naming Conventions
No ratings yet
Document Types and Naming Conventions
17 pages
Dg6 Phe - m15-Bfm Manual
100% (1)
Dg6 Phe - m15-Bfm Manual
21 pages
Question #1: Ans: B
No ratings yet
Question #1: Ans: B
25 pages
How Does Alcohol Affect Your Gut Health? - The New York Times
No ratings yet
How Does Alcohol Affect Your Gut Health? - The New York Times
3 pages
Silk Road, Father To Son
No ratings yet
Silk Road, Father To Son
14 pages
2a - Feedwater Heaters
No ratings yet
2a - Feedwater Heaters
26 pages
BTP Capabilities Assessment - Jan 21 v1.2
No ratings yet
BTP Capabilities Assessment - Jan 21 v1.2
35 pages
Common English Mistakes by Spanish Speakers
No ratings yet
Common English Mistakes by Spanish Speakers
4 pages
A00 Citigo TechnicalChange PDF
No ratings yet
A00 Citigo TechnicalChange PDF
22 pages
Manifesting Love How To Use PDF
No ratings yet
Manifesting Love How To Use PDF
1 page
Aetos Catalogue 2021 Small Web
No ratings yet
Aetos Catalogue 2021 Small Web
13 pages
Etbc New Songbook
No ratings yet
Etbc New Songbook
91 pages
Shop Act PDF
No ratings yet
Shop Act PDF
2 pages
Dual-Well Pump Solution
No ratings yet
Dual-Well Pump Solution
4 pages
VSCO Girl - Wikipedia
No ratings yet
VSCO Girl - Wikipedia
1 page
Designing A Low Dropout (LDO) Linear Regulator With The Cadence Virtuoso IC617
No ratings yet
Designing A Low Dropout (LDO) Linear Regulator With The Cadence Virtuoso IC617
11 pages
Overcoming Shyness PDF
No ratings yet
Overcoming Shyness PDF
38 pages
Article 1840. Creditors of The Old Partnership Are Still Creditors of The New Partnership When
No ratings yet
Article 1840. Creditors of The Old Partnership Are Still Creditors of The New Partnership When
9 pages
Jacking Systems
0% (1)
Jacking Systems
19 pages

Module 5

Uploaded by

Module 5

Uploaded by

Association Rule Mining and Applications

Unit 5: Association Rule Mining and Applications

2 Support and confidence

3 Frequent item sets, market basket analysis

6 Associative classification- Rule Mining

▪ An analytics technique employed by retailers to understand

▪ A market basket is a collection of items purchased by a

▪ The purpose of market basket analysis is to find interesting relationships among

▪A market basket is a collection of items purchased by a customer in a

113 106 5/10/99 Pen 1

▪A set of item is called item set.

▪The support of an item set is a fraction of transaction in the

▪The support supp(X) of an itemset X is defined as the proportion

▪For e.g. {pen, ink} has 75% support in purchases.

 The a Priori Property: Every subset of a frequent itemset is also

 By considering only itemsets obtained by enlarging frequent

▪ Frequent pattern - a pattern (a set of items, subsequences,

– What products are often purchased together?

– Can we automatically classify web documents?

▪ Finds interesting associations and relationships among

▪ Association Rule Mining is one of the ways to find patterns

– features (dimensions) which are “correlated”

▪ Initially used for Market Basket

▪ Association rule mining is suitable for non-numeric,

– People who have age-group [30,40] & income [>$100k]

▪ Itemset TID Items

– A set of one or more items 1 Bread, Milk

Association Rule 1 Bread, Milk

▪ An implication expression of the form X → Y, where X and Y

Rule Evaluation Metric s2 1 Bread, Milk

▪ Measures of rule interestingness that reflect the usefulness

▪ Support for an itemset (X) is the percentage of transactions that

▪ Supp(C → A) = 2/5 = 40%.

▪ Confidence of a rule is an indication of the strength of the

▪ sup (LHS U RHS) = % of

▪ Conf (C → A) =2/4 = 50%

▪ Given a set of transactions T, the goal of association rule

Consider frequent itemset X with support sx identified in the first step.

• Another way to improve rule generation is to consider rules like (p - a)

200 Bread, Cheese, Juice

300 Bread, Milk, Yoghurt

400 Bread, Cheese, Milk

500 Cheese, Milk , Juice

Item pairs Frequency

Association Support Confidence Confidence %

Final Rules are a→c and c→ a.

▪ Find association rule with 50 % support and 75 % confidence

1 Card reader, Memory card, Mobile, Laptop

UoM 1(Method 1) 1 Card reader, Memory card, Mobile, Laptop

– Laptop ➔ Card reader items support

Association rule support confidence Confidence %

2) 1 Card reader, Memory card, Mobile, Laptop

items support items support

Association support confidence Confidence %

Association rule support confidence Confidence %

– Scan all trans. and find FI(Frequent itemset) having one

– Recursively generate L2, L3, etc (using previous frequent

Step 2 (Find Association Rules)

▪ Find association rule with 25 % support and 70 %

▪ Organize and categorize data in distinct classes (i.e. give

▪ Association rules are generated and analyzed for use in

– Generate all class association rules(CARs)

– Building a classifier using the generated CARs.

– CBA (Classification By Association) Mine association

– CMAR (Classification based on Multiple Association Rules

▪ Given a training data set T, for a rule R : P→C

▪ Frequent rule items

▪ The set of class association rules (CARs) consists of all the

3 Milk, Diaper, Beer, Coke M

4 Bread, Milk, Diaper, Beer M

5 Bread, Milk, Diaper, Coke F

In a Classification task we want to predict the

A good (albeit stereotypical) rule is {Beer,Diaper} → Male

Possible Rule Items

You might also like