Association Rule Mining is a data mining technique used to discover interesting relationships
(associations, patterns, correlations) among variables in large datasets. It is often used in
market basket analysis to find which items are frequently bought together.
Key Concepts
● Support: How often an itemset appears in the dataset.
\text{Support}(X) = \frac{\text{Transactions containing X}}{\text{Total transactions}}
● Confidence: How often items in Y appear in transactions that contain X.
\text{Confidence}(X \rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X)}
● Lift: Strength of a rule compared to random chance.
\text{Lift}(X \rightarrow Y) = \frac{\text{Confidence}(X \rightarrow
Y)}{\text{Support}(Y)}
Simple Example
Suppose we have a dataset of 5 transactions:
Rule Example: {Milk, Diaper} → {Beer}
1. Support:
Transactions containing {Milk, Diaper, Beer} = 3 (Transactions 2, 3, 4)
Total transactions = 5
\text{Support} = \frac{3}{5} = 0.6
2. Confidence:
Transactions with {Milk, Diaper} = 4 (Transactions 2, 3, 4, 5)
\text{Confidence} = \frac{3}{4} = 0.75
3. Lift:
Support of {Beer} = 3/5 = 0.6
\text{Lift} = \frac{0.75}{0.6} = 1.25
Interpretation
● Support = 0.6 → 60% of transactions include Milk, Diaper, and Beer.
● Confidence = 0.75 → If Milk and Diaper are bought, there’s a 75% chance Beer is
also bought.
● Lift = 1.25 (>1) → Positive correlation: buying Milk and Diaper increases likelihood of
buying Beer.
Alright, let’s break this down step by step.
1. What is ROC Curve?
● ROC (Receiver Operating Characteristic) curve is a graph that shows the
performance of a binary classifier at different thresholds.
● It plots:
○ True Positive Rate (TPR / Sensitivity) on the Y-axis.
○ False Positive Rate (FPR) on the X-axis.
TPR = \frac{TP}{TP + FN} \quad\quad FPR = \frac{FP}{FP + TN}
2. Key Points on the ROC Curve
(a)
Always Positive Classifier
● The classifier predicts every sample as positive.
● That means:
○ TPR = 1 (it catches all positives).
○ FPR = 1 (it wrongly flags all negatives as positive).
● Point on ROC Curve: (1, 1) → top-right corner.
(b)
Always Negative Classifier
● The classifier predicts every sample as negative.
● That means:
○ TPR = 0 (misses all positives).
○ FPR = 0 (correctly avoids all negatives).
● Point on ROC Curve: (0, 0) → bottom-left corner.
(c)
Perfect Prediction
● The classifier makes no errors.
● That means:
○ TPR = 1 (all positives are correctly classified).
○ FPR = 0 (no negatives are misclassified).
● Point on ROC Curve: (0, 1) → top-left corner.
● This represents the ideal model.
3. Visual Layout of the ROC Curve
● (0,0): Always Negative
● (1,1): Always Positive
● (0,1): Perfect Classifier
● Diagonal Line (0,0 → 1,1): Random guessing
Here’s a clear comparison:
1. Supervised Learning
● Definition: Model learns from labeled data (input + correct output).
● Goal: Predict outcomes for new, unseen data.
● Examples:
○ Classification (spam vs not spam, disease diagnosis)
○ Regression (predicting house prices, temperature)
2. Unsupervised Learning
● Definition: Model learns from unlabeled data (only input, no output).
● Goal: Find hidden patterns, groupings, or structure in data.
● Examples:
○ Clustering (customer segmentation, grouping news articles)
○ Dimensionality reduction (PCA for feature reduction, visualization)
3. Reinforcement Learning
● Definition: Model learns by interacting with an environment. It gets rewards for good
actions and penalties for bad actions.
● Goal: Learn a sequence of actions that maximizes long-term rewards.
● Examples:
○ Game playing (Chess, Go, Atari, AlphaGo)
○ Robotics (teaching robots to walk, pick objects)
○ Self-driving cars (decision making in traffic)
Evaluation of Accuracy in Machine Learning
Accuracy is one of the simplest and most commonly used evaluation metrics in ML,
especially for classification problems.
1. Definition of Accuracy
Accuracy measures the proportion of correctly predicted instances out of the total instances.
\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}}
Or in terms of the confusion matrix:
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
Where:
● TP (True Positive): Correctly predicted positives
● TN (True Negative): Correctly predicted negatives
● FP (False Positive): Incorrectly predicted positives
● FN (False Negative): Incorrectly predicted negatives
2. Example
Suppose we have 100 emails:
● 70 are spam, 30 are not spam.
● Our model predicts:
○ 65 spam correctly (TP)
○ 25 not spam correctly (TN)
○ 5 not spam misclassified as spam (FP)
○ 5 spam misclassified as not spam (FN)
\text{Accuracy} = \frac{65 + 25}{100} = \frac{90}{100} = 90\%
3. Limitations of Accuracy
● Good when data is balanced (equal positives and negatives).
● Misleading when data is imbalanced.
Example: If 95% of patients are healthy and only 5% are sick, a model predicting all
healthy gets 95% accuracy — but it is useless for detecting sick patients.
4. Alternatives to Accuracy
When imbalance exists, other metrics are used:
● Precision (how many predicted positives are correct)
● Recall / Sensitivity (how many actual positives are caught)
● F1-score (harmonic mean of precision and recall)
● ROC-AUC (overall discrimination ability)
Bayes’ Theorem in Machine Learning
1. Definition
Bayes’ theorem describes the probability of an event based on prior knowledge of conditions
related to the event.
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
Where:
● P(A|B) → Posterior probability (probability of A given B).
● P(B|A) → Likelihood (probability of B given A).
● P(A) → Prior probability of A.
● P(B) → Marginal probability of B (normalizing constant).
2. Intuition
● Start with an initial belief (prior).
● See some new evidence (likelihood).
● Update belief into a revised probability (posterior).
3. Simple Example
Suppose:
● 1% of people have a disease (Prior: P(Disease) = 0.01).
● A test detects the disease correctly 99% of the time (P(Pos|Disease) = 0.99).
● Test gives false positive 5% of the time (P(Pos|No Disease) = 0.05).
Question: If a person tests positive, what is the probability they actually have the disease?
P(Disease|Positive) = \frac{P(Pos|Disease) \cdot P(Disease)}{P(Pos)}
Step 1: Compute denominator P(Pos):
P(Pos) = P(Pos|Disease) \cdot P(Disease) + P(Pos|No Disease) \cdot P(No Disease)
= (0.99 \times 0.01) + (0.05 \times 0.99) = 0.0099 + 0.0495 = 0.0594
Step 2: Compute posterior:
P(Disease|Positive) = \frac{0.99 \times 0.01}{0.0594} = \frac{0.0099}{0.0594} \approx 0.1667
Answer: Even after testing positive, the probability the person has the disease is ~16.7%, not
99%!
4. Relevance in ML
● Naïve Bayes Classifier uses Bayes’ theorem with independence assumption.
● Widely applied in spam filtering, text classification, medical diagnosis,
recommendation systems.