106 Unsupervised Learning - Association Rules
106 Unsupervised Learning - Association Rules
UNSUPERVISE
D LEARNING –
A SS O C I AT I O N
RULES
S U N I L G O RA N T L A
INTRODUCTION
• A method for discovering interesting relationships (associations)
between variables in large datasets.
• Commonly used in market basket analysis to identify products
that frequently co-occur in transactions.
• Example: If a customer buys bread, they are likely to buy butter.
FREQU ENCY AND SU P P ORT
• Frequent Itemsets:
• An itemset is a collection of one or more items.
• An itemset is considered frequent if it appears in the dataset
at least as many times as a pre-specified threshold (minimum
support).
• Support:
• The proportion of transactions that contain the itemset.
• Formula:
CONFIDENCE, LIFT
• Confidence:
• The likelihood that a transaction containing itemset X also contains
itemset Y.
• Lift:
• The ratio of the observed support to the expected support under
independence.
• Tomato = 100
• Bread = 40
• Jam = 15
ALGORITHM
• Step 1: Generate candidate itemsets of length 1.
• Step 2: Prune the itemsets that do not meet the minimum
support threshold.
• Step 3: Generate candidate itemsets of length 2 from the
frequent itemsets of length 1.
• Step 4: Repeat until no more frequent itemsets can be
generated.
• Step 5: Generate association rules from the frequent itemsets.
E C L AT
• ECLAT (Equivalence Class Clustering and bottom-up Lattice
Traversal) is an efficient algorithm for finding frequent itemsets.
• Unlike Apriori, which uses horizontal data layout (itemsets),
ECLAT uses vertical data layout (transaction IDs).
• Developed as an alternative to Apriori to handle large datasets
more efficiently.
T I D AND SU P P ORT
• Transaction ID Sets (TID Sets):In ECLAT, each item is
represented by the set of transaction IDs where it appears.
• The algorithm computes intersections of TID sets to find frequent
itemsets.
• Support:
• The support of an itemset is determined by the size of its TID
set.
ALGORITHM
• Step 1: Convert the dataset into a vertical format where each item is
associated with a TID set.
• Step 2: Calculate the support for individual items by counting the
number of transactions in their TID sets.
• Step 3: Generate larger itemsets by intersecting TID sets of smaller
itemsets (e.g., pairwise intersections for 2-itemsets).
• Step 4: Prune itemsets that do not meet the minimum support threshold.
• Step 5: Continue this process until no larger frequent itemsets can be
generated.
Items
Transaction ID Purchased
{Bread, Milk,
T1 Butter}
T2 {Bread, Milk}
EXAMPLE
T3 {Milk, Butter}
T4 {Bread, Butter}
{Bread, Milk,
T5 Butter}
D A T A L AY O U T
• Apriori • ECLAT
• T1: {Bread, Milk, Butter} • Bread: {T1, T2, T4, T5}
• T2: {Bread, Milk} • Milk: {T1, T2, T3, T5}
• T3: {Milk, Butter} • Butter: {T1, T3, T4, T5}
• T4: {Bread, Butter}
• T5: {Bread, Milk, Butter}
FREQUENT 1-ITEMSETS
G E N E R AT I O N
• Apriori • ECLAT
• {Bread}: 4 transactions • Bread: {T1, T2, T4, T5} (4
• {Milk}: 4 transactions transactions)
• {Butter}: 4 transactions • Milk: {T1, T2, T3, T5} (4
transactions)
• Butter: {T1, T3, T4, T5} (4
transactions)
FREQUENT 2-ITEMSETS
G E N E R AT I O N
• Apriori • ECLAT
• {Bread, Milk}: 3 • {Bread, Milk}: {T1, T2, T5} (3
transactions transactions)
• {Bread, Butter}: 3 • {Bread, Butter}: {T1, T4, T5} (3
transactions transactions)
• {Milk, Butter}: 3 • {Milk, Butter}: {T1, T3, T5} (3
transactions transactions)