0% found this document useful (0 votes)

4 views8 pages

Machine Learning PDF

Uploaded by

afna7665

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views8 pages

Machine Learning PDF

Uploaded by

afna7665

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Association Rule Mining is a data mining technique used to discover interesting relationships

(associations, patterns, correlations) among variables in large datasets. It is often used in

market basket analysis to find which items are frequently bought together.

Key Concepts

● Support: How often an itemset appears in the dataset.

\text{Support}(X) = \frac{\text{Transactions containing X}}{\text{Total transactions}}

● Confidence: How often items in Y appear in transactions that contain X.

\text{Confidence}(X \rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X)}

● Lift: Strength of a rule compared to random chance.

\text{Lift}(X \rightarrow Y) = \frac{\text{Confidence}(X \rightarrow

Y)}{\text{Support}(Y)}

Simple Example

Suppose we have a dataset of 5 transactions:

Rule Example: {Milk, Diaper} → {Beer}

1. Support:

Transactions containing {Milk, Diaper, Beer} = 3 (Transactions 2, 3, 4)

Total transactions = 5

\text{Support} = \frac{3}{5} = 0.6

2. Confidence:

Transactions with {Milk, Diaper} = 4 (Transactions 2, 3, 4, 5)

\text{Confidence} = \frac{3}{4} = 0.75

3. Lift:
Support of {Beer} = 3/5 = 0.6

\text{Lift} = \frac{0.75}{0.6} = 1.25

Interpretation

● Support = 0.6 → 60% of transactions include Milk, Diaper, and Beer.

● Confidence = 0.75 → If Milk and Diaper are bought, there’s a 75% chance Beer is
also bought.

● Lift = 1.25 (>1) → Positive correlation: buying Milk and Diaper increases likelihood of
buying Beer.

Alright, let’s break this down step by step.

1. What is ROC Curve?

● ROC (Receiver Operating Characteristic) curve is a graph that shows the

performance of a binary classifier at different thresholds.

● It plots:

○ True Positive Rate (TPR / Sensitivity) on the Y-axis.

○ False Positive Rate (FPR) on the X-axis.

TPR = \frac{TP}{TP + FN} \quad\quad FPR = \frac{FP}{FP + TN}

2. Key Points on the ROC Curve

(a)

Always Positive Classifier

● The classifier predicts every sample as positive.

● That means:

○ TPR = 1 (it catches all positives).

○ FPR = 1 (it wrongly flags all negatives as positive).

● Point on ROC Curve: (1, 1) → top-right corner.

(b)

Always Negative Classifier

● The classifier predicts every sample as negative.

● That means:

○ TPR = 0 (misses all positives).

○ FPR = 0 (correctly avoids all negatives).

● Point on ROC Curve: (0, 0) → bottom-left corner.

(c)

Perfect Prediction

● The classifier makes no errors.

● That means:

○ TPR = 1 (all positives are correctly classified).

○ FPR = 0 (no negatives are misclassified).

● Point on ROC Curve: (0, 1) → top-left corner.

● This represents the ideal model.

3. Visual Layout of the ROC Curve

● (0,0): Always Negative

● (1,1): Always Positive

● (0,1): Perfect Classifier

● Diagonal Line (0,0 → 1,1): Random guessing

Here’s a clear comparison:

1. Supervised Learning

● Definition: Model learns from labeled data (input + correct output).

● Goal: Predict outcomes for new, unseen data.

● Examples:

○ Classification (spam vs not spam, disease diagnosis)

○ Regression (predicting house prices, temperature)

2. Unsupervised Learning

● Definition: Model learns from unlabeled data (only input, no output).

● Goal: Find hidden patterns, groupings, or structure in data.

● Examples:

○ Clustering (customer segmentation, grouping news articles)

○ Dimensionality reduction (PCA for feature reduction, visualization)

3. Reinforcement Learning

● Definition: Model learns by interacting with an environment. It gets rewards for good
actions and penalties for bad actions.

● Goal: Learn a sequence of actions that maximizes long-term rewards.

● Examples:

○ Game playing (Chess, Go, Atari, AlphaGo)

○ Robotics (teaching robots to walk, pick objects)

○ Self-driving cars (decision making in traffic)

Evaluation of Accuracy in Machine Learning

Accuracy is one of the simplest and most commonly used evaluation metrics in ML,
especially for classification problems.

1. Definition of Accuracy

Accuracy measures the proportion of correctly predicted instances out of the total instances.

\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}}

Or in terms of the confusion matrix:

\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}

Where:

● TP (True Positive): Correctly predicted positives

● TN (True Negative): Correctly predicted negatives

● FP (False Positive): Incorrectly predicted positives

● FN (False Negative): Incorrectly predicted negatives

2. Example

Suppose we have 100 emails:

● 70 are spam, 30 are not spam.

● Our model predicts:

○ 65 spam correctly (TP)

○ 25 not spam correctly (TN)

○ 5 not spam misclassified as spam (FP)

○ 5 spam misclassified as not spam (FN)

\text{Accuracy} = \frac{65 + 25}{100} = \frac{90}{100} = 90\%

3. Limitations of Accuracy

● Good when data is balanced (equal positives and negatives).

● Misleading when data is imbalanced.

Example: If 95% of patients are healthy and only 5% are sick, a model predicting all
healthy gets 95% accuracy — but it is useless for detecting sick patients.

4. Alternatives to Accuracy

When imbalance exists, other metrics are used:

● Precision (how many predicted positives are correct)

● Recall / Sensitivity (how many actual positives are caught)

● F1-score (harmonic mean of precision and recall)

● ROC-AUC (overall discrimination ability)

Bayes’ Theorem in Machine Learning

1. Definition

Bayes’ theorem describes the probability of an event based on prior knowledge of conditions
related to the event.

P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

Where:

● P(A|B) → Posterior probability (probability of A given B).

● P(B|A) → Likelihood (probability of B given A).

● P(A) → Prior probability of A.

● P(B) → Marginal probability of B (normalizing constant).

2. Intuition

● Start with an initial belief (prior).

● See some new evidence (likelihood).

● Update belief into a revised probability (posterior).

3. Simple Example

Suppose:

● 1% of people have a disease (Prior: P(Disease) = 0.01).

● A test detects the disease correctly 99% of the time (P(Pos|Disease) = 0.99).

● Test gives false positive 5% of the time (P(Pos|No Disease) = 0.05).

Question: If a person tests positive, what is the probability they actually have the disease?

P(Disease|Positive) = \frac{P(Pos|Disease) \cdot P(Disease)}{P(Pos)}

Step 1: Compute denominator P(Pos):

P(Pos) = P(Pos|Disease) \cdot P(Disease) + P(Pos|No Disease) \cdot P(No Disease)

= (0.99 \times 0.01) + (0.05 \times 0.99) = 0.0099 + 0.0495 = 0.0594

Step 2: Compute posterior:

P(Disease|Positive) = \frac{0.99 \times 0.01}{0.0594} = \frac{0.0099}{0.0594} \approx 0.1667

Answer: Even after testing positive, the probability the person has the disease is ~16.7%, not
99%!

4. Relevance in ML

● Naïve Bayes Classifier uses Bayes’ theorem with independence assumption.

● Widely applied in spam filtering, text classification, medical diagnosis,

recommendation systems.

Data M
No ratings yet
Data M
10 pages
Data M11
No ratings yet
Data M11
5 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
B83c05aa 672f 4234 A627 Cfc944f11d45 Classification Summary
No ratings yet
B83c05aa 672f 4234 A627 Cfc944f11d45 Classification Summary
6 pages
Binary Classifier Evaluation Guide
No ratings yet
Binary Classifier Evaluation Guide
12 pages
Compare Class I Fiers Part 13
No ratings yet
Compare Class I Fiers Part 13
32 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
Lecture11evaluationmetricsforclassification 240913060639 0c766554
No ratings yet
Lecture11evaluationmetricsforclassification 240913060639 0c766554
28 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
Unit8 (Evaluation Method)
No ratings yet
Unit8 (Evaluation Method)
43 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Evaluation Metrics in Machine Learning - GeeksforGeeks
No ratings yet
Evaluation Metrics in Machine Learning - GeeksforGeeks
6 pages
Notes Machine Learning
No ratings yet
Notes Machine Learning
25 pages
Blue Property
No ratings yet
Blue Property
10 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
Bi 2
No ratings yet
Bi 2
25 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
Imbalanced Classes in Big Data
No ratings yet
Imbalanced Classes in Big Data
20 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
26 pages
Unit 4
No ratings yet
Unit 4
20 pages
Guide To AUC ROC Curve in Machine Learning
No ratings yet
Guide To AUC ROC Curve in Machine Learning
10 pages
Model Evaluation
No ratings yet
Model Evaluation
31 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
Model Evaluation for Data Scientists
No ratings yet
Model Evaluation for Data Scientists
7 pages
Data Mining: Class Imbalance Solutions
No ratings yet
Data Mining: Class Imbalance Solutions
56 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
Auc Roc Curve Machine Learning
No ratings yet
Auc Roc Curve Machine Learning
12 pages
Machine Learning Overview & SVMs
No ratings yet
Machine Learning Overview & SVMs
378 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Artificial Intelligence Lec 3
No ratings yet
Artificial Intelligence Lec 3
17 pages
Logistic Regression With R
No ratings yet
Logistic Regression With R
58 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Classification Personal
No ratings yet
Classification Personal
36 pages
Ca 3 Merged
No ratings yet
Ca 3 Merged
275 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
Machine Learning Unit-2
No ratings yet
Machine Learning Unit-2
89 pages
DWDM Final5
No ratings yet
DWDM Final5
45 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
12 pages
9 - Session 9 - Visualizing Model Performance, Evidence and Probabilities
No ratings yet
9 - Session 9 - Visualizing Model Performance, Evidence and Probabilities
37 pages
AUC ROC Curve for ML Enthusiasts
No ratings yet
AUC ROC Curve for ML Enthusiasts
5 pages
DS Notes
No ratings yet
DS Notes
36 pages
Confusion Matrix
No ratings yet
Confusion Matrix
8 pages
Chap4 Imbalanced Classes
No ratings yet
Chap4 Imbalanced Classes
28 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
8 pages
Handling Imbalanced Data in ML
No ratings yet
Handling Imbalanced Data in ML
6 pages
Performance
No ratings yet
Performance
11 pages
MedSchoolGurus NBME Score Calculator
No ratings yet
MedSchoolGurus NBME Score Calculator
33 pages
The EM Algorithm and Extensions Second Edition Geoffrey J. Mclachlan Instant Download
No ratings yet
The EM Algorithm and Extensions Second Edition Geoffrey J. Mclachlan Instant Download
62 pages
BRI405B
No ratings yet
BRI405B
2 pages
QBM101 Tutorial 7
No ratings yet
QBM101 Tutorial 7
2 pages
Econometrics Assignment Analysis
No ratings yet
Econometrics Assignment Analysis
10 pages
Exam Financial Econometrics 1 - Normal Session 2023-2024
No ratings yet
Exam Financial Econometrics 1 - Normal Session 2023-2024
1 page
Tutorial Sheet 1 MSM
No ratings yet
Tutorial Sheet 1 MSM
3 pages
Business Statistics Essentials
No ratings yet
Business Statistics Essentials
12 pages
Used Car Price Prediction Model
0% (1)
Used Car Price Prediction Model
39 pages
Analisis Jalur 3x
No ratings yet
Analisis Jalur 3x
6 pages
Activity Sheets: Quarter 3 - MELC 19
100% (1)
Activity Sheets: Quarter 3 - MELC 19
14 pages
Simulation Techniques Explained
No ratings yet
Simulation Techniques Explained
18 pages
EC4401 - Pract. Exam (2024-2025)
No ratings yet
EC4401 - Pract. Exam (2024-2025)
3 pages
M6 - Check-In Activity 1
No ratings yet
M6 - Check-In Activity 1
3 pages
Covid-19 Treatment Efficacy Analysis
No ratings yet
Covid-19 Treatment Efficacy Analysis
2 pages
Stat
No ratings yet
Stat
2 pages
Statistical Analysis and Confidence Intervals
No ratings yet
Statistical Analysis and Confidence Intervals
3 pages
Lecture 4 & 5 - Chapter 5 - Forecasting
No ratings yet
Lecture 4 & 5 - Chapter 5 - Forecasting
50 pages
17 Statistical Hypothesis Tests in Python (Cheat Sheet)
No ratings yet
17 Statistical Hypothesis Tests in Python (Cheat Sheet)
44 pages
Quiz in Practical Research 2
100% (2)
Quiz in Practical Research 2
2 pages
9 SVM 2
No ratings yet
9 SVM 2
7 pages
Math Supplement PDF
No ratings yet
Math Supplement PDF
17 pages
Gruber 6e Lecture Slides Ch03
No ratings yet
Gruber 6e Lecture Slides Ch03
36 pages
Lecture Notes - 3
No ratings yet
Lecture Notes - 3
17 pages
Data Analysis
No ratings yet
Data Analysis
11 pages
Chapter 3 - Parametric Estimation Using Confidence Intervals
No ratings yet
Chapter 3 - Parametric Estimation Using Confidence Intervals
7 pages
CH 04 Ex 8
No ratings yet
CH 04 Ex 8
2 pages
Chapter 4 - BUSINESS STATISTICS
No ratings yet
Chapter 4 - BUSINESS STATISTICS
14 pages
Probability & Statistics Assignment
No ratings yet
Probability & Statistics Assignment
22 pages
Flexible Regression and Smoothing The GAMLSS Packages in R
100% (1)
Flexible Regression and Smoothing The GAMLSS Packages in R
380 pages