[go: up one dir, main page]

0% found this document useful (0 votes)
12 views69 pages

Module 5

Uploaded by

samaymistry105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views69 pages

Module 5

Uploaded by

samaymistry105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Association Rule Mining and Applications

Unit 5: Association Rule Mining and Applications


1. Introduction to Data Mining with Case Studies – G. K. Gupta
2. Data Mining – Soman, Diwakar, Ajay
3. Data Mining concepts and techniques second edition by Jiawei Han
and Micheline Kamber.
4. Data Mining:” Introductory and Advancedtopics” , Pearson
Education, by M.Dunham

No Module
1 Association rule mining

2 Support and confidence

3 Frequent item sets, market basket analysis

4 Apriori algorithm

5 Incremental ARM

6 Associative classification- Rule Mining


Market Basket Analysis

▪ An analytics technique employed by retailers to understand


customer purchase behaviors. It is used to determine what
items are frequently bought together or placed in the same
basket by customers.
▪ Online retailers and publishers can use it to
– Changing the store layout according to trends
– Catalog design
– Cross marketing on online stores
– Customized promotions, emails with add-on sales etc.
– Inform the placement of content items on their media sites,
or products in their catalog
– Deliver targeted marketing
Market Basket Analysis
Other Applications

▪ Medical diagnosis
▪ Protein Sequences
▪ Fraud Detection in Credit Card Transactions
▪ Bio-Medical Literature
▪ Customer Relationship Management (CRM)
▪ Census Data
Counting Co-occurrences

▪ A market basket is a collection of items purchased by a


customer in a single customer transaction.
▪ A customer transaction consists of purchasing the items
from the store by single visit.
▪ A common goal for retailer is to
identify items that are purchased together - Frequent
Itemsets TID Items
1 One Transaction
Bread, Milk
2 Bread,
Diaper, Beer,
Eggs
3 Milk, Diaper, Beer,
Coke
4 Bread, Milk, Diaper,
Beer
Binary Representation of Transactions
5 Bread, Milk, Diaper,
Coke
Market Basket Analysis

▪ The purpose of market basket analysis is to find interesting relationships among


retail products. The results of a market basket analysis help retailers design
promotions, arrange shelf or catalog items & develop cross-marketing strategies.

▪ Association rule algorithms are used to apply a market basket analysis to a set of
data.

▪ A common goal for retailers is to identify items that are purchased together
▪ This information can be used to improve the layout of goods in a store or the
layout of catalog pages
Counting Co-occurrences

▪A market basket is a collection of items purchased by a customer in a


single customer transaction.
▪A customer transaction consists of purchasing the items from the store
by single visit, a single order through a mail-order catalog , or an order
at a store on the webs
▪A common goal for retailer is to identify items that are purchased
together.
Frequent Itemset
The Purchases Relation for Market Basket Analysis
transid custid date item qty
111 201 5/1/99 Pen 2
111 201 5/1/99 Ink 1
111 201 5/1/99 Milk 3
111 201 5/1/99 juice 6
112 105 6/3/99 Pen 1
112 105 6/3/99 Ink 1
112 105 6/3/99 Milk 1

113 106 5/10/99 Pen 1


113 106 5/10/99 Milk 1
114 201 6/1/99 Pen 2
114 201 6/1/99 Ink 2
114 201 6/1/99 juice 4
114 201 6/1/99 water 1
Frequent Itemsets : Terminology

▪A set of item is called item set.

▪The support of an item set is a fraction of transaction in the


database that contains all the item from itemset.

OR

▪The support supp(X) of an itemset X is defined as the proportion


of transactions in the data set which contain the itemset
Frequent Itemsets

▪For e.g. {pen, ink} has 75% support in purchases.


▪We thus conclude that pen and ink are the items which are
frequently purchased together.
▪On the other hand, {milk, juice} are not purchased together
frequently.
▪User can specify the minimum support (minsup) and find all itemsets
whose support is above minsup. Such itemsets are called frequent
itemset.
▪These sets of items may be a singleton set.
▪Let’s consider the user specified minimum support as 70% then
frequent items will be {pen}, {ink}, {milk}, {pen, ink}, {pen, milk}.
Algorithm to identify frequent itemset
Algorithm to identify frequent itemset is based on a simple but
fundamental property of frequent itemsets:

 The a Priori Property: Every subset of a frequent itemset is also


a frequent itemset.

 By considering only itemsets obtained by enlarging frequent


itemsets, the number of candidate frequent itemsets are
reduced: this optimization is crucial for efficient execution
Frequent Pattern Analysis

▪ Frequent pattern - a pattern (a set of items, subsequences,


substructures, etc.) that occurs frequently in a data set
▪ Motivation - Finding inherent regularities in data

– What products are often purchased together?


Bread and butter?
– What are the subsequent purchases after buying a PC?

– Can we automatically classify web documents?

▪ Applications
– Market Basket data analysis, cross-marketing, catalog
design, sale campaign analysis, Web log (click stream)
analysis, DNA sequence analysis
Association Rule Mining

▪ Finds interesting associations and relationships among


large sets of data items
▪ Shows how frequently an itemset occurs in a transaction

▪ Association Rule Mining is one of the ways to find patterns


in data. It finds:
– features (dimensions) which occur together

– features (dimensions) which are “correlated”

▪ Initially used for Market Basket


Analysis to find how items
purchased by customers are related.
▪ Bread → Milk [sup = 5%, conf = 100%]
When to use Association Rules

▪ Association rule mining is suitable for non-numeric,


categorical data
▪We can use Association Rules in any dataset where
features take only two values i.e., 0/1. Some examples are
listed below:
– Market Basket Analysis is a popular application of
Association Rules.
– People who visit webpage X are likely to visit webpage Y

– People who have age-group [30,40] & income [>$100k]


are likely to own home
Definitions

▪ Itemset TID Items

– A set of one or more items 1 Bread, Milk


2 Bread, Diaper, Beer, Eggs
– E.g. {Milk, Bread, Diaper}
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
TID Items

Definitions 1
2
Bread, Milk
Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
▪ Support count () 4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
– Frequency of occurrence
– of an itemset (number of transactions it appears in) E.g.
({Milk, Bread, Diaper}) = 2
▪ Support
– Fraction of the transactions in which an itemset appears
– E.g. s({Milk, Bread, Diaper}) = 2/5
▪Frequent (large) Itemset
– An itemset whose support is greater than or equal to
a
minsup threshold
– User can specify the minimum support (minsup) and find all
itemsets whose support is above minsup - frequent itemset
TID Items

Association Rule 1 Bread, Milk


2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

▪ An implication expression of the form X → Y, where X and Y


are itemsets
Mining for Rules
Many algorithms have been proposed for discovering various forms of
rules that briefly describe the data.

Association Rules

An association rule has the form LHS RHS, where both LHS and RHS
are sets of items.
The interpretation of such a rule is that if every item in LHS is
purchased in a transaction, then it is likely that the items in RHS are
purchased as well.
Association Rule
By examining the set of transaction in Purchase, we can identify rules
of the form:
{ pen} {ink}
This is read as “If a pen is purchased in a transaction, it is likely that
the ink is also purchased in that transaction”.
It is the a statement that describe the transactions in the database.
There are two important measures for an association rule:
Support

Confidence
Important Measures for association rule

Support :-
 The support for a set of items is the percentage of transactions that contain all
these items.
 The support for a rule LHS RHS is the support for the set of items LHS U RHS.
 The support of the itemset {pen, ink} (in the previous example) ,is 75%
Confidence :-
 Let sup (LHS) be the percentage of transaction that contain LHS and let sup
(LHS U RHS) be the percentage of transaction that contain both LHS & RHS.
 The confidence is sup (LHS U RHS) / sup (LHS).
 The confidence of the last example is 75%.
 Confidence of a rule is an indication of the strength of the rule.
TID Items

Rule Evaluation Metric s2 1 Bread, Milk


Bread, Diaper, Beer, Eggs
3 Mi lk, Diaper , Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

▪ Measures of rule interestingness that reflect the usefulness


and certainty of discovered rules
▪ Support (s)
◦ Fraction of transactions that contain both X and Y
▪ Confidence (c)
Measures how often items in Y appear in

transactions that
contain X { M ilk , D ia p e r }  B e e r

s =
 ( M i l k , D i a p e r, B e e r ) = 2 = 0 .4
|T | 5

c =
 ( M ilk , D ia p e r, B e e r ) = 2 = 0 .6 7
 ( M ilk , D ia p e r ) 3
Support

▪ Support for an itemset (X) is the percentage of transactions that


contain (X), i.e. how often it occurs in the database
▪ The support for a rule LHS → RHS is fraction of transactions
that contain both LHS & RHS.
▪ Supp (LHS → RHS) = Supp(LHS U
RHS)/ Total no. of
transactions
▪ Supp(C → A) ?

▪ Supp(C → A) = 2/5 = 40%.


Confidence

▪ Confidence of a rule is an indication of the strength of the


rule.
▪ Conf (LHS → RHS) = Supp(LHS U RHS) / sup (LHS)

▪ sup (LHS U RHS) = % of


transacation that contains both
LHS & RHS.
▪ sup (LHS) = % of transaction that contains LHS.

▪ Conf (C → A) ?

▪ Conf (C → A) =2/4 = 50%


Association Rule Mining Task

▪ Given a set of transactions T, the goal of association rule


mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold
Another example
Transac Milk Bread Butter Potato
To illustrate the concepts, tion id
we use a small example 1 1 1 0 0
from the supermarket 2 0 1 1 0
domain.
3 0 0 0 1
The set of items is
4 1 1 1 0
I = {milk,bread,butter,potato}
5 0 1 0 0
and a small database
containing the items (1
codes presence and 0
absence of an item in a
transaction)
An example rule for the supermarket could be {milk,
bread} ➔ {butter} meaning that if milk and bread is bought,
customers also buy butter.
Conf (X ➔ Y)=supp (X U Y)/ supp (X)
For itemset {milk, bread}, Support is 2 / 5 = 0.4 i.e. it occurs in 40% of
all transactions (2 out of 5 transactions).
For itemset {milk, bread, butter}, Support is 1 / 5 = 0.2
Conf {milk, bread➔ {butter} = 0.2/0.4 = 0.5
which means that for 50% of the transactions containing milk and
bread the rule is correct.
Algorithm to identify frequent itemset
The algorithm proceeds iteratively, first identifying frequent itemsets
with just one item
In each subsequent iteration, frequent itemsets identified in the previous
iteration are extended with another item to generate larger candidate
itemsets
By considering only itemsets obtained by enlarging frequent itemsets, we
greatly reduce the number of candidate frequent itemsets; this
optimization is crucial for efficient execution
The a priori property guarantees that this optimization is correct, that is,
we don't miss any frequent itemsets
A single scan of all transactions suffices to determine which candidate
itemsets generated in an iteration are frequent itemsets
The algorithm terminates when no new frequent itemsets are identfied
in an iteration
Algorithm to identify frequent itemset
First we need to do is to count the support for each item
We drop 1-itemsets that have support below the desired cut-off value to
create list of the frequent 1-itemsets
The general procedure to obtain k-itemsets from (k - 1)-itemsets for k =
2, 3, … is as follows:
Create a candidate list of k-itemsets by performing a join operation on
pairs of (k - 1)-itemsets in the list
A pair is combined only if the first (k - 2) items are the same
in both itemsets. (When k = 2. this simply means that all possible pairs
are to be combined.)
If the condition is met the join of pair is a k-itemset that contains the
common first (k - 2) items and the two items that are not in common,
one from each member of the pair
Algorithm to identify frequent itemset

All frequent k-itemset must be in this candidate list since every subset
of size (k - 1) of a frequent k-itemset must be
frequent (k - 1) itemset
k-itemsets in the candidate list which are not frequent k-itemsets, need
to deleted
To identify the, itemsets that are not frequent we examine all subsets
of size (k - 1) of each candidate k-itemset
We need to examine only (k - 1)-itemsets that contain the last two
items of the candidate k-itemset
If anyone of these subsets of size (k - 1) is not present in the frequent
(k -1) item set list, we know that the candidate k-itemset cannot be a
frequent itemset
Algorithm to identify frequent itemset
Mining Association Rules
(Apriori Algorithm)
▪ Two-step approach:
– Frequent Itemset Generation: Find all frequent
item-sets with support >= pre-determined min_support
count
– Rule Generation: List all Association Rules from
frequent item-sets. Calculate Support and Confidence for
all rules.
Prune rules that fail min_support and min_confidence
thresholds.
▪Use Apriori Algorithm
Algorithm for finding Association Rules

A user can ask for all association rules that have a specified minimum
support (minsup) & minimum confidence (minconf).
The algorithms proceed in two steps.
All frequent itemsets with the user-specified minimum support are computed.
Rules are generated using the frequent itemsets as input.

Consider frequent itemset X with support sx identified in the first step.


To generate a rule from X,divide X into two itemsets,LHS & RHS.
From the a priori property,we know that the support of LHS is larger than
minsup
The confidence of the rule LHS RHS is sx / sLHS & then check how the
ratio compares to minconf.
Algorithm for finding Association Rules
• To find the association rules from the frequent itemsets, we take a large
frequent itemset, say p, and find each nonempty subset a
• The rule a ~ (p - a) is possible if it satisfies the confidence. Confidence of
this rule is given by support(p )/support(a)
• It should be noted that when considering rules like a ~ (p - a), it is possible
to make the rule generation process more efficient as follows:
• We only want rules that have the minimum confidence required
• Since confidence is given by support(P)/support(a), it is clear that if for
some a, the rule a ~ (p - a) does not have the minimum confidence then all
rules like b ~ (p - b), where b is a subset of a, will also not have the
confidence since support(b) cannot be smaller than support(a)
• It therefore makes sense to generate all subsets a in a recursive fashion
and once we find a subset a for which the rule a ~ (p - a) does not have
minimum confidence, then no smaller subsets need to be checked
• As an example, if ABC ~ D does not have the minimum confidence then
AB ~ CD or A ~ BCD also will not have the confidence.
Algorithm for finding Association Rules

• Another way to improve rule generation is to consider rules like (p - a)


~ a. If this rule has the minimum confidence then all rules (p - b) ~ b will
also have minimum confidence if b is a subset of a since (p-b) has more
items than (p-a),given that b is smaller than a and so can not have
support higher than that of (p-a)
• e.g. if A->BC D has the minimum confidence then all rules like AB ->
CD, AC -> BD and ABC -> D will also have the minimum confidence
Apriori : Transactions

Transaction ID Items
100 Bread, Cheese, Eggs , Juice

200 Bread, Cheese, Juice

300 Bread, Milk, Yoghurt

400 Bread, Cheese, Milk

500 Cheese, Milk , Juice


Apriori : Frequent items L1

Items Frequency
Bread 4

Cheese 3
Juice 4
Milk 3
Apriori : Candidate item pairs C2

Item pairs Frequency


{Bread,cheese} 2

{Bread,juice} 3
{Bread,milk} 2
{cheese,juice} 3

{cheese,milk} 1

{juice,milk} 2
Apriori : Candidate item pairs C2
• There are two frequent item pairs which are {bread,juice} ,{cheese, juice }: L2
• From these two frequent 2-itemsets,we don’t obtain a candidate 3-itemset since
two 2-itemsets don’t have the same first item
• The two frequent 2-itemsets lead to the following possible rules :
bread -> juice
juice -> bread
cheese -> juice
juice -> cheese
transaction Itemset
1 a,b,c
2 a,c
Example 1: 3 a,d
4 b,e,f

▪ minsupport = 50%
▪ minConfidence= 50%
▪ Support count=(minsupport/100) * total no. of transactions
= (50/100)*4 = 2
items support C2
C1 Candidate item sets a, b 1
L1 large set
items support a, c 2
items support b, c 1
a 3 a 3
b 2 b 2
c 2 L2 items support
c 2
d 1 a, c 2
e 1
f 1
transaction Itemset
Example 1: 1 a,b,c
2 a,c
3 a,d
▪ Minsupport = 50% 4 b,e,f
▪ MinConfidence= 50%
items support
a, c 2

Association Support Confidence Confidence %


rule
a→c 2 =support/occurrence of “a” 66% >minconf
=2/3
=0.66
c→a 2 =support/occurrence of “c” 100% >minconf
=2/2
=1

Final Rules are a→c and c→ a.


University Question

▪ Find association rule with 50 % support and 75 % confidence


for the above data.
– Find support and confidence of the following rules:
– Laptop ➔ Card reader
– Laptop, Mobile ➔ Card reader
Trans id Items

1 Card reader, Memory card, Mobile, Laptop


2 Card reader, Mobile, Laptop
3 digi cam, Laptop, LCD TV
4 Card reader, digi cam, Laptop
5 Card reader, digi cam, Mobile
tid Itemset

UoM 1(Method 1) 1 Card reader, Memory card, Mobile, Laptop


2 Card reader, Mobile, Laptop
3 digi cam, Laptop, LCD TV
4 Card reader, digi cam, Laptop
▪ Minsupport = 50% 5 Card reader, digi cam, Mobile
▪ minConfidence= 75%
▪ Support count=(minsupport/100) * total no. of transactions = (50/100)*5 = 2.5
Ξ2 L2
C1 L1 C2
items support items support items support
items support
CR 4 CR 4 CR, M 3
CR, M 3
MC 1 M 3 CR, L 3
CR, L 3
M 3 L 4 CR, DC 2
CR, DC 2
L 4 DC 3 M, L 2
M, L 2
DC 3 L, DC 2
M, DC 1
LTV 1 C2
L, DC 2
items support L3
CR, M, L 2
items support
CR, M, DC 1
CR, M, L 2
CR, L, DC 1
M, L, DC 0
UoM tid Itemset
1(Method 1) 1 Card reader, Memory card, Mobile, Laptop
2 Card reader, Mobile, Laptop
3 digi cam, Laptop, LCD TV
4 Card reader, digi cam, Laptop
▪ Minsupport = 50% 5 Card reader, digi cam, Mobile
▪ minConfidence= 75%
▪ Support count= 2 L3

– Laptop ➔ Card reader items support


CR, M, L 2
– Laptop, Mobile ➔ Card reader

Association rule support confidence Confidence %


CR→M,L 2 2/4 50%
M→CR, L 2 2/3 66%
L→CR,M 2 2/4 50%
CR,M→L 2 2/3 66%
M,L→CR 2 2/2 100% > minconf
L,CR→M 2 2/3 66%
L→CR 3 3/4 75% = minconf
…. …. …… ……
UoM 1(Method tid Itemset

2) 1 Card reader, Memory card, Mobile, Laptop


2 Card reader, Mobile, Laptop
3 digi cam, Laptop, LCD TV
4 Card reader, digi cam, Laptop
▪ Minsupport = 50% 5 Card reader, digi cam, Mobile
▪ minConfidence= 75%
▪ Support count=(minsupport/100) * total no. of transactions = (50/100)*5 = 2.5
Ξ3 L2
C1 L1 C2
items support items support items support
items support
CR 4 CR 4 CR, M 3
CR, M 3
MC 1 M 3 CR, L 3
CR, L 3
M 3 L 4 CR, DC 2
L 4 DC 3 M, L 2
DC 3 M, DC 1
LTV 1 L, DC 2
C3 L3

items support items support

CR, M, L 2
tid Itemset
UoM 1(Method 2) 1 Card reader, Memory card, Mobile, Laptop
2 Card reader, Mobile, Laptop
3 digi cam, Laptop, LCD TV
4 Card reader, digi cam, Laptop
▪ Minsupport = 50% 5 Card reader, digi cam, Mobile
▪ minConfidence= 75%
▪ Support count= 3 L2
items support
– Laptop ➔ Card reader
CR, M 3
– Laptop, Mobile ➔ Card reader
CR, L 3

Association support confidence Confidence %


rule
CR→M 3 3/4 75%=minconf
CR→L 3 3/4 75%=minconf
M→CR 3 3/3 100% >minconf
L→CR 3 3/4 75%=minconf
Example 2 transaction
1
Itemset
m,o,n,k,e,y
2 d,o,n,k,e,y
3 m,a,k,e,
4 m,u,c,k,y
▪ Minsupport = 60%
5 c,o,o,k,i,e
▪ minConfidence= 80%
▪ Support count=(minsupport/100) * total no. of transactions = (60/100)*5 = 3
L2
C1 L1 C2
items support items support items support
items support
M 3 M 3 Mk 3
M,o 1
O 3 O 3 O, k 3
M,k 3
N 2 K 5 O, e 3
M,e 2
K 5 E 4 K,E 4
M,y 2
E 4 Y 3 K,Y 3
O,k 3
Y 3 O,e 3
D 1 O,y 2
A 1 K,e 4
U 1 K,y 3
C 2 E,y 2
I 1
Example 2 transaction
1
Itemset
m,o,n,k,e,y
2 d,o,n,k,e,y
3 m,a,k,e,
▪ minsupport = 60% 4 m,u,c,k,y
▪ minConfidence= 80% 5 c,o,o,k,i,e
items support L3
M,k,o, 1
items support
C3 M ke 2
O,k,e 3
Mky 2
Oke 3 C4 is not possible as only 1 frequent itemset present
Oky 2
Key 2

Association rule support confidence Confidence %


O,k→E 3 =support/occurrence of O & K 100% >minconf
=3/3 = 1
O,e → K 3 =3/3 =1 100% >minconf
K,e →o 3 =3/4=0.75 75%
E→o,k 3 =3/4=0.75 75%
K→o,e 3 =3/5=0.6 60%
O→k,e 3 =3/3=1 100%
Practice 1
Practice 1
Practice 1
Practice 2
Apriori Algorithm
▪ Note: user specifies 2 parameters: minsupport, minconf.
▪ Step 1 (Find Frequent Itemsets and Large Itemsets)

– Scan all trans. and find FI(Frequent itemset) having one


item. Those having support above minsupport - Let these be
L1.
▪ Iterations:

– Recursively generate L2, L3, etc (using previous frequent


itemsets) until FIs of all sizes are generated.
– General procedure to obtain k-itemsets from (k-1) itemsets
for k=2, 3, …. is as follows:
– Create a candidate list Ck of k-itemsets by performing a join
operation on pairs of (k-1) itemsets in the list Lk-1. A pair is
combined only if (k-2) items are same in both itemsets.
– Scan the trans. (DB) to find Lk from Ck .
Apriori Algorithm (contd)

Step 2 (Find Association Rules)


– Generate the rules by dividing Large itemsets Lk into LHS
and RHS part such that LHS ➔ RHS is valid association rule
if it meets minconf. requirement.
Practice

▪ Find association rule with 25 % support and 70 %


confidence for the following data:
Trans id Items
1 Biscuits, Bread, Cheese, Coffee, Yogurt
2 Bread, Cereal, Cheese, Coffee
3 Cheese, Chocolate, Donuts, Juice, Milk
4 Bread, Cheese, Coffee, Cereal, Juice
5 Bread, Cereal, Chocolate, Donuts, Juice
6 Milk, Tea
7 Biscuits, Bread, Cheese, Coffee, Milk
8 Egg, Milk, Tea
9 Bread, Cereal, Cheese, Chocolate, Coffee
10 Bread, Cereal, Chocolate, Donuts, Juice
Advantages and Disadvantages of
Apriori Algorithm
Associative
Classification
Classification

▪ Organize and categorize data in distinct classes (i.e. give


them class labels)
Associative Classification

▪ Association rules are generated and analyzed for use in


classification
▪ Search for strong associations between
frequent patterns and class labels
▪ Classification: Based on evaluating a set of rules in the form
of p1 ^ p2 … ^ pl → Aclass = C” (conf, sup)
▪ Steps in Associative Classification

– Generate all class association rules(CARs)

– Building a classifier using the generated CARs.


Associative Classification (AC)
Problem
▪ Given a labeled training data set, the problem is to derive a
set of class association rules (CARs) from the training data
set which satisfy certain user-constraints, i.e support and
confidence thresholds.
▪ Common Associative Algorithms:

– CBA (Classification By Association) Mine association


possible rules in the form of Cond-set (a set of attribute-
value pairs) → class label
– CPAR (Classification based on Predictive Association Rules

– CMAR (Classification based on Multiple Association Rules

– MCAR
AC Steps
Rule support and confidence for AC

▪ Given a training data set T, for a rule R : P→C


– Support of Rsup(R) - number of rows matching R
condition and having a class label C
– Confidence of Rconf(R) - number of rows matching R
condition and having class label C over the number of
objects matching R condition
– Any Item that has a support larger than the user minimum
support is called frequent itemset
Rule Generator: Basic Concepts

▪ Frequent rule items


– A rule item is frequent if its support is above minsup

▪ Accurate rule
– A rule is accurate if its confidence is above minconf

▪ The set of class association rules (CARs) consists of all the


possible rules (PRs) that are both frequent and accurate.
Classification using ARM
TID Items Gender
1 Bread, Milk F Min Support : 25%
2 Bread, Diaper, Beer, Eggs M Min Confidence : 70%

3 Milk, Diaper, Beer, Coke M

4 Bread, Milk, Diaper, Beer M

5 Bread, Milk, Diaper, Coke F

In a Classification task we want to predict the


class label (Gender) using the attributes

A good (albeit stereotypical) rule is {Beer,Diaper} → Male


whose support is 60% (3/5) and confidence is 100% (3/3)
CBA Example
CBA Example

Possible Rule Items


from Table1
•Questions
•What is frequent itemsets? What is apriory property? Describe an algorithm for finding frequent
itemsets.
•What is association rule mining? Explain the application of association rule mining with an example.
Explain Apriory algo with example.
•Find out the association rule with all possible support and confidence percent from the following sample
data:
T1 Bread, Jelly, Butter
T2 Bread, Butter
T3 Bread, Milk, Butter
T4 Juice, Bread
T5 Juice, Milk
•Consider database, D, consisting of 9 transactions. Suppose min. support count required is 2 (i.e.
min_sup=2/9=22%) and let minimum confidence required be 70%. Answer the following
•Find out FI using apriori algorithm.
•Few association rules using min support and min confidence.
T1 I1, I2, I5
T2 I2, I4
T3 I2, I3
T4 I1, I2, I4
T5 I1, I3
T6 I2, I3
T7 I1, I3
T8 I1, I2, I3, I5
T9 I1, I2, I3

You might also like