0% found this document useful (0 votes)

17 views4 pages

Data Mining Chapter 2: Market Basket Analysis

Chapter 2 discusses market basket analysis using association rules to identify relationships among items in transactional data, particularly in retail. It introduces the Apriori algorithm for mining association rules, emphasizing statistical measures like support, confidence, and lift to evaluate rule significance. The chapter also covers the strengths and weaknesses of the method, along with practical applications in R for data extraction, model training, and performance evaluation.

Uploaded by

oumaima abaied

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views4 pages

Data Mining Chapter 2: Market Basket Analysis

Uploaded by

oumaima abaied

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Chapter 2: Basket Analysis - Association Rule

This chapter covers machine learning methods for identifying associations among items in transactional
data—a practice commonly known as market basket analysis due to its widespread use among retail
store.
To remember:
- Association rule: unsupervised learning: unlabeled data: no need of algorithm to be trained
- Useful for large amounts of transactional data: Big Data
- statistical measures of "interestingness”:
- Confidence => reliability, accuracy
- Lift: A & B are independent => the strength of the association relative
- Density value= non-zero sparse matrix cells.

Understanding association rules

The result of a market basket analysis is a set of association rules that specify patterns of relationships
among items.
A typical rule might be expressed in the form:
{ peanut butter, jelly } → { bread }
Developed in the context of Big Data and database science, association rules are not used for
prediction, but rather for unsupervised knowledge discovery in large databases, unlike the classification
and numeric prediction algorithms seen before.

The Apriori algorithm for association rule learning

The challenge in association rule mining arises from the vast number of potential itemsets, which grow
exponentially with the number of items (2^k possible combinations). Apriori addresses this by leveraging
the Apriori property, which states that all subsets of a frequent itemset must also be frequent. This
allows the algorithm to eliminate infrequent item combinations early, significantly reducing the search
space.

Support:
𝐶𝑜𝑢𝑛𝑡 (𝐴)
𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐴) = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑡𝑒𝑚𝑠

Apriori uses statistical measures of "interestingness", which help evaluate the usefulness and
significance of discovered rules. The most common measures include:

1- Confidence

● Definition: The likelihood that item B is purchased when item A is purchased.

𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐴,𝐵) 𝑃(𝐴∩𝐵)
● Formula: 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒(𝐴 → 𝐵) = 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐴)
= 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐴)
● Purpose: Measures the reliability of the rule. Higher confidence means that B often follows A in
transactions.
2- Lift

● Definition: The ratio of the observed support of A and B appearing together to the expected
support if A and B were independent.
𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐴∩𝐵) 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒(𝐴→𝐵)
● Formula:𝐿𝑖𝑓𝑡(𝐴 → 𝐵) = 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐴)× 𝑠𝑝𝑝𝑜𝑟𝑡(𝐵)
= 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐵)
● Purpose: Indicates the strength of the association relative to random chance.
○ Lift > 1: Positive association (items occur together more often than by chance); know the
degree to which those two occurrences are dependent on one another.

A large lift value is therefore a strong indicator that a rule is important, and reflects a true
connection between the items.

○ Lift = 1: No association.

○ Lift < 1: Negative association (items occur together less often than by chance); know the
items are substitutes for each other..

Note: Unlike confidence where the item order matters, lift(X -> Y) is the same as lift(Y -> X)

Strong rules have high support and confidence, making them valuable for decision-making, such as
optimizing product placement in a store. However, checking all possible item combinations becomes
impractical for large datasets. To address this, the Apriori algorithm applies minimum thresholds for
support and confidence to efficiently identify the most relevant rules.

Strengths
- Is ideally suited for working with very large amounts of transactional data
- Results in rules that are easy to understand
- Useful for "data mining" and discovering unexpected knowledge in databases

Weaknesses
- Not very helpful for small datasets
- Takes effort to separate the insight from the common sens
- Easy to draw spurious conclusions from random patterns
R application:
Data extraction and preparation:
groceries <- Is similar to read.csv() except that it results in a sparse matrix
read.transactions( suitable for transactional data.
"groceries.csv", The parameter sep="," specifies that items in the input file are
sep = ",") separated by a comma.

summary(groceries) Used to see some basic information about the groceries dataset

inspect(groceries[1:5]) Used to look at the contents of the sparse matrix

itemFrequency( Allows us to see the proportion of transactions that contain the item
groceries[, 1:3])

itemFrequencyPlot( this results in a histogram showing the items in the data with at least
groceries, support=0.1) 10 percent support

itemFrequencyPlot( The histogram is then sorted by decreasing support, as shown in the

groceries, topN=20) following diagram for the top 20 items in the dataset

image(groceries[1:5]) Allows us to visualize the sparse matrix.

image(sample Allows us to view the sparse matrix for a randomly sampled set of
(groceries, 100)) transactions.

inspect(groceryrules[1:3]) To take a look at specific rules

Training the model:

Package: arules

Association Rule Syntax

Using the apriori()function in the arules package
Myrules <- apriori(data = mydata, parameter = list(support = 0.1, confidence =
0.8, minlen = 1)
● Data: sparse matrix holding transactional data
● Support: specifies the minimum required rule support
● Confidence: specifies the minimum required rule confidence
● Minlen: specifies the minimum required rule items
The function will return a rules object storing all rules that meet the minimum criteria.
By default, support = 0.1 and confidence = 0.8.

Examining association rules:

inspect(myrules)
● myrules is a set of association rules from the apriori() function
This will output the association rules to the screen. Vector operators can be used on myrules to choose a
specific rule(s) to view.
Setting the support:
Think about the minimum number of transactions you would need before you would consider a
pattern interesting.

Setting the confidence:

- If confidence is too low; we might be overwhelmed with a large number of unreliable rules—such as
dozens of rules indicating items commonly purchased with batteries.
- If we set confidence too high, then we will be limited to rules that are obvious or inevitable—like the
fact that a smoke detector is always purchased in combination with batteries.
The appropriate minimum confidence level depends a great deal on the goals of your analysis. If you
start with conservative (high) values, you can always reduce them to broaden the search if you aren't
finding actionable intelligence.

groceryrules <- apriori( This saves our rules in a rules object, which we can peek into by
groceries, typing its name:
parameter = list( groceryrules
support = 0.006, set of 463 rules
confidence = 0.25, Our groceryrules object contains a set of 463 association rules. To
minlen = 2)) determine whether any of them are useful, we'll have to dig deeper.

Evaluating the model’s performance:

summary(groceryrules) To obtain a high-level overview of the association rules
rule length distribution (lhs + rhs):sizes
2 3 4
150 297 16

inspect(groceryrules[1:3]) To look at specific rules

Improving the model’s performance:

1- Sorting the set of association rules:
The most useful rules might be those with the highest support, confidence, or lift.
By default, the sort order is decreasing. To reverse this order, add parameter decreasing = FALSE.

sort() reorder the list of rules

2- Taking subsets of association rules

subset() Searches for subsets of transactions, items, or rules

berryrules <- Finds any rules with berries appearing in the rule
subset(groceryrules, items inspect(berryrules)
%in% "berries")

Association Rule Mining
No ratings yet
Association Rule Mining
26 pages
ML Module3
No ratings yet
ML Module3
83 pages
Market Basket Analysis Using Association Rules Unit 5
No ratings yet
Market Basket Analysis Using Association Rules Unit 5
21 pages
Association Rules
No ratings yet
Association Rules
29 pages
COS10022 DSP Week06 Association Rules
No ratings yet
COS10022 DSP Week06 Association Rules
52 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
Lesson 8 Association Rules
No ratings yet
Lesson 8 Association Rules
58 pages
Apriori Documentation
No ratings yet
Apriori Documentation
31 pages
Dar Lec 15 Association Rules
No ratings yet
Dar Lec 15 Association Rules
16 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
Untitled Document
No ratings yet
Untitled Document
59 pages
W5 - Apriori
No ratings yet
W5 - Apriori
16 pages
Apriori Algorithm or Market Basket Analysis - Kaggle
No ratings yet
Apriori Algorithm or Market Basket Analysis - Kaggle
30 pages
Lec 4
No ratings yet
Lec 4
22 pages
Data Mining-Module Ii Notes (S4 Bca)
No ratings yet
Data Mining-Module Ii Notes (S4 Bca)
40 pages
Apriori Algorithm in Word File
No ratings yet
Apriori Algorithm in Word File
16 pages
Lec 2
No ratings yet
Lec 2
18 pages
6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
Market Basket Analysis Case PDF
No ratings yet
Market Basket Analysis Case PDF
35 pages
Association Rules
No ratings yet
Association Rules
24 pages
Introduction To The Apriori Algorithm
No ratings yet
Introduction To The Apriori Algorithm
10 pages
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
No ratings yet
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
18 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
APRIARI Algorithm
No ratings yet
APRIARI Algorithm
55 pages
Unit 3 - DM FULL
No ratings yet
Unit 3 - DM FULL
46 pages
Week 3 Association Rule Mining
No ratings yet
Week 3 Association Rule Mining
6 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
Statistical Computing With R: Masters in Data Sciences 503 (S28) Third Batch, SMS, TU, 2024
No ratings yet
Statistical Computing With R: Masters in Data Sciences 503 (S28) Third Batch, SMS, TU, 2024
35 pages
UNIT III
No ratings yet
UNIT III
13 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
20 pages
Marketbasket Analysis
No ratings yet
Marketbasket Analysis
28 pages
Da Exp 9
No ratings yet
Da Exp 9
5 pages
DMDW 05
No ratings yet
DMDW 05
12 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
Lecture - 11 - Sathya - Zainab
No ratings yet
Lecture - 11 - Sathya - Zainab
17 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
Chapter 3
No ratings yet
Chapter 3
8 pages
CS8091 - Big Data Analytics - Unit 3
No ratings yet
CS8091 - Big Data Analytics - Unit 3
26 pages
Aml Unit 3
No ratings yet
Aml Unit 3
17 pages
CSA 106 Market Basket Analysis
No ratings yet
CSA 106 Market Basket Analysis
13 pages
استخدام نظام SPSS في تحليل البيانات الإحصائية د محمود خالد عكاشة
100% (1)
استخدام نظام SPSS في تحليل البيانات الإحصائية د محمود خالد عكاشة
38 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Chapter 14 Association Rules
No ratings yet
Chapter 14 Association Rules
23 pages
Association Rule Mining Using Apriori Al PDF
No ratings yet
Association Rule Mining Using Apriori Al PDF
11 pages
Market Basket Analysis Using: R Tool
No ratings yet
Market Basket Analysis Using: R Tool
23 pages
Ariori Introduction and Concept
No ratings yet
Ariori Introduction and Concept
37 pages
Lecture 8
No ratings yet
Lecture 8
13 pages
Market Basket Analysis: Rengarajan R (19049)
No ratings yet
Market Basket Analysis: Rengarajan R (19049)
12 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Association Rule
No ratings yet
Association Rule
22 pages
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
No ratings yet
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
27 pages
SysAdmin MCQ
No ratings yet
SysAdmin MCQ
92 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
12 pages
Programming Assignment
No ratings yet
Programming Assignment
5 pages
BIM Thevaganan
No ratings yet
BIM Thevaganan
21 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
7 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
Enyew Kibebew Thesis ABH
No ratings yet
Enyew Kibebew Thesis ABH
90 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
DM Chapter 8
No ratings yet
DM Chapter 8
7 pages
Faculty of Information Systems
No ratings yet
Faculty of Information Systems
66 pages
PC DMIS Software de Masura PDF
No ratings yet
PC DMIS Software de Masura PDF
24 pages
MAHLET
No ratings yet
MAHLET
38 pages
Data Analysis
No ratings yet
Data Analysis
17 pages
Sapalogy All in 1 Brousher
No ratings yet
Sapalogy All in 1 Brousher
16 pages
STD Deviation
No ratings yet
STD Deviation
8 pages
DM Chapter 7
No ratings yet
DM Chapter 7
6 pages
CAP101 Syllabus
No ratings yet
CAP101 Syllabus
19 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
Data Management PDF
No ratings yet
Data Management PDF
89 pages
BA Assignment
No ratings yet
BA Assignment
5 pages
Adv Econ Chapter 1: Modeling Framework
No ratings yet
Adv Econ Chapter 1: Modeling Framework
5 pages
Cross-Validation, Regularization, and Principal Components Analysis (PCA)
No ratings yet
Cross-Validation, Regularization, and Principal Components Analysis (PCA)
47 pages
Protocolo ACT para Duelos Por Rupturas Amorosas
No ratings yet
Protocolo ACT para Duelos Por Rupturas Amorosas
19 pages
What Is EDA in Data Science - Everything About Exploratory Data - by Aman Kharwal - Medium
No ratings yet
What Is EDA in Data Science - Everything About Exploratory Data - by Aman Kharwal - Medium
11 pages
Ratio Analysis An Empirical Study of The Aluminum Industries in India 2018
No ratings yet
Ratio Analysis An Empirical Study of The Aluminum Industries in India 2018
45 pages
Hostel Survey Analysis Report
No ratings yet
Hostel Survey Analysis Report
10 pages
Homogeneity of Variance Tutorial
No ratings yet
Homogeneity of Variance Tutorial
14 pages
TSA Chapters 1: Introduction To Time Series
No ratings yet
TSA Chapters 1: Introduction To Time Series
4 pages
Mo Front Project Pages
No ratings yet
Mo Front Project Pages
8 pages
01 DMBI Module 01 (Introduction) PPT-compressed
No ratings yet
01 DMBI Module 01 (Introduction) PPT-compressed
19 pages
2018 Revised Hutech - Thesis Guidelines
100% (1)
2018 Revised Hutech - Thesis Guidelines
33 pages
Scientific Method
No ratings yet
Scientific Method
24 pages
Sarvesh Swami
No ratings yet
Sarvesh Swami
1 page
Data Science Is A Multidisciplinary Field That Uses Scientific Methods
No ratings yet
Data Science Is A Multidisciplinary Field That Uses Scientific Methods
2 pages
Mathematics Anxiety Separating The Math From The Anxiety
No ratings yet
Mathematics Anxiety Separating The Math From The Anxiety
9 pages
SPSS Project
0% (1)
SPSS Project
12 pages
OPRE 6301.502 - 2018 Fall Syllabus
No ratings yet
OPRE 6301.502 - 2018 Fall Syllabus
7 pages
Salman's Resume Cognizant
No ratings yet
Salman's Resume Cognizant
1 page
Coolfreecv Resume With Photo
No ratings yet
Coolfreecv Resume With Photo
1 page
Quiz Solutions
No ratings yet
Quiz Solutions
6 pages

Data Mining Chapter 2: Market Basket Analysis

Uploaded by

Data Mining Chapter 2: Market Basket Analysis

Uploaded by

Chapter 2: Basket Analysis - Association Rule

Understanding association rules

The Apriori algorithm for association rule learning

●​ Definition: The likelihood that item B is purchased when item A is purchased.

inspect(groceries[1:5]) Used to look at the contents of the sparse matrix

itemFrequencyPlot( The histogram is then sorted by decreasing support, as shown in the

image(groceries[1:5]) Allows us to visualize the sparse matrix.

inspect(groceryrules[1:3]) To take a look at specific rules

Training the model:

Association Rule Syntax

Examining association rules:

Setting the confidence:

Evaluating the model’s performance:

inspect(groceryrules[1:3]) To look at specific rules

Improving the model’s performance:

sort() reorder the list of rules

2- Taking subsets of association rules

You might also like

● Definition: The likelihood that item B is purchased when item A is purchased.