[go: up one dir, main page]

0% found this document useful (0 votes)
4 views8 pages

Machine Learning PDF

Uploaded by

afna7665
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views8 pages

Machine Learning PDF

Uploaded by

afna7665
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Association Rule Mining is a data mining technique used to discover interesting relationships

(associations, patterns, correlations) among variables in large datasets. It is often used in


market basket analysis to find which items are frequently bought together.

Key Concepts

● Support: How often an itemset appears in the dataset.

\text{Support}(X) = \frac{\text{Transactions containing X}}{\text{Total transactions}}

● Confidence: How often items in Y appear in transactions that contain X.

\text{Confidence}(X \rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X)}

● Lift: Strength of a rule compared to random chance.

\text{Lift}(X \rightarrow Y) = \frac{\text{Confidence}(X \rightarrow


Y)}{\text{Support}(Y)}

Simple Example

Suppose we have a dataset of 5 transactions:

Rule Example: {Milk, Diaper} → {Beer}

1. Support:

Transactions containing {Milk, Diaper, Beer} = 3 (Transactions 2, 3, 4)

Total transactions = 5

\text{Support} = \frac{3}{5} = 0.6

2. Confidence:

Transactions with {Milk, Diaper} = 4 (Transactions 2, 3, 4, 5)

\text{Confidence} = \frac{3}{4} = 0.75

3. Lift:
Support of {Beer} = 3/5 = 0.6

\text{Lift} = \frac{0.75}{0.6} = 1.25

Interpretation

● Support = 0.6 → 60% of transactions include Milk, Diaper, and Beer.

● Confidence = 0.75 → If Milk and Diaper are bought, there’s a 75% chance Beer is
also bought.

● Lift = 1.25 (>1) → Positive correlation: buying Milk and Diaper increases likelihood of
buying Beer.

Alright, let’s break this down step by step.

1. What is ROC Curve?

● ROC (Receiver Operating Characteristic) curve is a graph that shows the


performance of a binary classifier at different thresholds.

● It plots:

○ True Positive Rate (TPR / Sensitivity) on the Y-axis.

○ False Positive Rate (FPR) on the X-axis.

TPR = \frac{TP}{TP + FN} \quad\quad FPR = \frac{FP}{FP + TN}

2. Key Points on the ROC Curve

(a)

Always Positive Classifier


● The classifier predicts every sample as positive.

● That means:

○ TPR = 1 (it catches all positives).

○ FPR = 1 (it wrongly flags all negatives as positive).

● Point on ROC Curve: (1, 1) → top-right corner.

(b)

Always Negative Classifier

● The classifier predicts every sample as negative.

● That means:

○ TPR = 0 (misses all positives).

○ FPR = 0 (correctly avoids all negatives).

● Point on ROC Curve: (0, 0) → bottom-left corner.

(c)

Perfect Prediction

● The classifier makes no errors.

● That means:

○ TPR = 1 (all positives are correctly classified).

○ FPR = 0 (no negatives are misclassified).

● Point on ROC Curve: (0, 1) → top-left corner.

● This represents the ideal model.


3. Visual Layout of the ROC Curve

● (0,0): Always Negative

● (1,1): Always Positive

● (0,1): Perfect Classifier

● Diagonal Line (0,0 → 1,1): Random guessing

Here’s a clear comparison:

1. Supervised Learning

● Definition: Model learns from labeled data (input + correct output).

● Goal: Predict outcomes for new, unseen data.

● Examples:

○ Classification (spam vs not spam, disease diagnosis)

○ Regression (predicting house prices, temperature)

2. Unsupervised Learning

● Definition: Model learns from unlabeled data (only input, no output).

● Goal: Find hidden patterns, groupings, or structure in data.

● Examples:

○ Clustering (customer segmentation, grouping news articles)

○ Dimensionality reduction (PCA for feature reduction, visualization)


3. Reinforcement Learning

● Definition: Model learns by interacting with an environment. It gets rewards for good
actions and penalties for bad actions.

● Goal: Learn a sequence of actions that maximizes long-term rewards.

● Examples:

○ Game playing (Chess, Go, Atari, AlphaGo)

○ Robotics (teaching robots to walk, pick objects)

○ Self-driving cars (decision making in traffic)

Evaluation of Accuracy in Machine Learning

Accuracy is one of the simplest and most commonly used evaluation metrics in ML,
especially for classification problems.

1. Definition of Accuracy

Accuracy measures the proportion of correctly predicted instances out of the total instances.

\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}}

Or in terms of the confusion matrix:

\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}

Where:

● TP (True Positive): Correctly predicted positives

● TN (True Negative): Correctly predicted negatives

● FP (False Positive): Incorrectly predicted positives

● FN (False Negative): Incorrectly predicted negatives


2. Example

Suppose we have 100 emails:

● 70 are spam, 30 are not spam.

● Our model predicts:

○ 65 spam correctly (TP)

○ 25 not spam correctly (TN)

○ 5 not spam misclassified as spam (FP)

○ 5 spam misclassified as not spam (FN)

\text{Accuracy} = \frac{65 + 25}{100} = \frac{90}{100} = 90\%

3. Limitations of Accuracy

● Good when data is balanced (equal positives and negatives).

● Misleading when data is imbalanced.

Example: If 95% of patients are healthy and only 5% are sick, a model predicting all
healthy gets 95% accuracy — but it is useless for detecting sick patients.

4. Alternatives to Accuracy

When imbalance exists, other metrics are used:

● Precision (how many predicted positives are correct)

● Recall / Sensitivity (how many actual positives are caught)

● F1-score (harmonic mean of precision and recall)

● ROC-AUC (overall discrimination ability)


Bayes’ Theorem in Machine Learning

1. Definition

Bayes’ theorem describes the probability of an event based on prior knowledge of conditions
related to the event.

P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

Where:

● P(A|B) → Posterior probability (probability of A given B).

● P(B|A) → Likelihood (probability of B given A).

● P(A) → Prior probability of A.

● P(B) → Marginal probability of B (normalizing constant).

2. Intuition

● Start with an initial belief (prior).

● See some new evidence (likelihood).

● Update belief into a revised probability (posterior).

3. Simple Example

Suppose:

● 1% of people have a disease (Prior: P(Disease) = 0.01).

● A test detects the disease correctly 99% of the time (P(Pos|Disease) = 0.99).

● Test gives false positive 5% of the time (P(Pos|No Disease) = 0.05).

Question: If a person tests positive, what is the probability they actually have the disease?

P(Disease|Positive) = \frac{P(Pos|Disease) \cdot P(Disease)}{P(Pos)}


Step 1: Compute denominator P(Pos):

P(Pos) = P(Pos|Disease) \cdot P(Disease) + P(Pos|No Disease) \cdot P(No Disease)

= (0.99 \times 0.01) + (0.05 \times 0.99) = 0.0099 + 0.0495 = 0.0594

Step 2: Compute posterior:

P(Disease|Positive) = \frac{0.99 \times 0.01}{0.0594} = \frac{0.0099}{0.0594} \approx 0.1667

Answer: Even after testing positive, the probability the person has the disease is ~16.7%, not
99%!

4. Relevance in ML

● Naïve Bayes Classifier uses Bayes’ theorem with independence assumption.

● Widely applied in spam filtering, text classification, medical diagnosis,


recommendation systems.

You might also like