Lec-7 Intro Machine Learning
Lec-7 Intro Machine Learning
2
Topics (Edit Later)
• What is Machine Learning?
• What is Deep Learning?
• Difference between Supervised and Unsupervised Learning
• Supervised Learning Process
• Evaluating performance
• Overfitting
3
Machine Learning (ML)
Artificial
• Subset/branch/subfield of Artificial Intelligence
Intelligence (AI)
• “Learning machines to imitate
Machine
human intelligence” Learning
• “Focuses on the using data and
algorithms to enable AI to imitate the
way that humans learn, gradually
improving its accuracy.”-IBM
Deep
• Allows computers to learn without Learning
explicit programming
4
Traditional vs ML
Programing
5
ML Related Fields data
mining control theory
statistics
decision theory
information theory machine
learning
cognitive science
databases
psychological models
Machine learning is primarily
concerned with the accuracy and evolutionary neuroscience
models
effectiveness of the computer
system.
6
• Fraud detection. • Network intrusion
ML • Web search results. detection.
Application • Real-time ads on web
• Recommendation
Engines
pages
s • Credit scoring. • Customer Segmentation
7
Types of ML
Semi-
Supervised Unsupervised Reinforcement
Supervised
Machine Machine Machine
Machine
Learning Learning Learning
Learning
8
Supervised
Machine
Learning
• Model gets trained on a
“Labelled Dataset”
• High accuracy as they are
trained on labelled data
• Can be time-consuming
and costly as it relies on
labeled data only
• Two main categories
• Classification
• Regression
Labeled: America
Labelled: British 9
Unsupervised
Machine Learning
10
Semi- • Works between the supervised and unsupervised
Supervised learning so it uses both labelled and unlabelled data
Machine • Useful when obtaining labeled data is costly, time-
consuming, or resource-intensive
Learning
Supervised Unsupervised
• Classification • Clustering
• Logistic Regression • K-Means Clustering algorithm
• Support Vector Machine • Mean-shift algorithm
• Random Forest • DBSCAN Algorithm
• Decision Tree • Principal Component Analysis
• K-Nearest Neighbors (KNN)
• Independent Component Analysis
• Naive Bayes
• Regression • Association
• Linear Regression • Apriori Algorithm
• Polynomial Regression • Eclat
• Ridge Regression • FP-growth Algorithm
• Lasso Regression
• Decision tree
• Random Forest
12
Reinforcement Machine
Learning
• Interacts with the environment by
producing actions and discovering
errors.
• Trial, error, and delay are the most
relevant characteristics
• Popular Algorithm
• Q-learning
• SARSA (State-Action-Reward-State-
Action)
• Deep Q-learning
13
14
ML Applications
15
Photo Tail length neck Has Is
No (cm) length horn? Giraffe?
(cm)
1 5 8 Yes Yes
2 2 3 No No
3 1 2 No No
4 0 2 No No
17
CRITERIA L AB E L E D D ATA U N L AB E L E D D ATA
Definition Data with both input features and corresponding output labels Data with only input features and no output labels
Example Images labeled with categories like "cat," "dog" Images without any category labels
Supervised Learning Essential for training models Not used directly in training models
Unsupervised
Not applicable Essential for discovering patterns and structures
Learning
18
Supervised Machine Learning Process
(1)
Test
Data
Model
Data Data Model Model
Training &
Acquisition Cleaning Testing Deployment
Building
19
Supervised Machine Learning Process
(2)
• Get your data! Customers, Sensors, etc...
Data
Acquisition
20
Supervised Machine Learning Process
(3)
• Clean and format your data (using Pandas)
Data Data
Acquisition Cleaning
21
Supervised Machine Learning Process
(4)
Test
Data
Model
Data Data Training &
Acquisition Cleaning Building
22
Supervised Machine Learning Process
(5)
Test
Data
Model
Data Data Model
Training &
Acquisition Cleaning Testing
Building
23
Supervised Machine Learning Process
(6)
Test
Data
Model
Data Data Model
Training &
Acquisition Cleaning Testing
Building
Adjust
Model
Parameters
24
Model parameters
Parameters Hyperparameters
25
Supervised Machine Learning Process
(7)
Test
Data
Model
Data Data Model Model
Training &
Acquisition Cleaning Testing Deployment
Building
26
ML Data Sources
27
Popular Data Sources
28
Why Data Cleaning/Preprocessing?
• Data in the real world is dirty
• incomplete: lacking attribute values, lacking certain attributes of
interest, or containing only aggregate data
• noisy: containing errors or outliers
• inconsistent: containing discrepancies in codes or names
• No quality data, no quality mining results!
• Quality decisions must be based on quality data
• Data warehouse needs consistent integration of quality data
29
Data Reduction Strategies
• Data reduction: Obtain a reduced representation
of the data set that is much smaller in volume. But
produces the same (or almost the same) analytical
results
• Why data reduction?
• A database/data warehouse may store
terabytes of data. Complex data analysis may
take a very long time to run on the complete
data set.
• Data reduction strategies
• Dimensionality reduction, e.g., remove
unimportant attributes
• Principal Components Analysis (PCA)
• Feature subset selection, feature
creation/extraction
• Compression, Sampling, Aggregation,
Filtering, Transformation, …
30
Data Splitting
• Training Data
• Used to train model parameters
• Validation Data
• Used to determine what model hyperparameters to adjust
• Test Data
• Used to get some final performance metric
31
Cross-
validation
• Rahim’s exam
preparation
• Questions covered
from know/unknown
chapters
• Result: good/bad
• Is he a good/bad
student?
32
K-Fold Cross Validation
• Divide the dataset into K chunks (i.e., folds) and train K times, using
a different fold for each time.
• E.g., Assume K=5
33
K-Fold
• Final Model Evaluation: (S1 + S2 + S3 + S4 + S5)
Cross /5
Validation
Iteration (1 to Training Set Test Performance Score
K) Set
1 D1, D2, D3, D4, D1 S1
D5
2 D1, D3, D4, D5 D2 S2
3 D1, D2, D4, D5 D3 S3
4 D1, D2, D3, D5 D4 S4
5 D1, D2, D3, D4 D5 S5
34
Overfitting and Underfitting
• Underfitting: Poor performance on the training data and poor
generalization to other (unseen) data
• Overfitting: Good performance on training data but poor
generalization to other (unseen) data (memorizing!!)
35
Performance Metrics/Model Evaluation
Key Classification Problem • Clustering
• Confusion Matrix (not a metric)
• Elbow Method (not a
• Accuracy performance metric but used to
• Precision find optimal number of K
• Recall cluster)
• F1-Score
Regression Problem
36
Confusion Matrix (not a
metric)
37
Evaluating Performance
REGRESSION
Model Evaluation
TRAINED
MODEL
Model Evaluation
TRAINED
Test Image
from X_test
MODEL
Model Evaluation
TRAINED
Test Image
from X_test
MODEL
DOG
Correct Label
from y_test
Model Evaluation
TRAINED
Test Image DOG
from X_test
MODEL
Prediction on
Test Image
DOG
Correct Label
from y_test
Model Evaluation
TRAINED
Test Image DOG
from X_test
MODEL
Prediction on
Test Image
DOG
Correct Label DOG == DOG ?
from y_test
Compare Prediction to Correct Label
Model Evaluation
TRAINED
Test Image CAT
from X_test
MODEL
Prediction on
Test Image
DOG
Correct Label DOG == CAT ?
from y_test
Compare Prediction to Correct Label
Model Evaluation
● Accuracy
○ Accuracy in classification problems is the
number of correct predictions made by the
model divided by the total number of
predictions.
Model Evaluation
● Accuracy
○ For example, if the X_test set was 100 images
and our model correctly predicted 80 images,
then we have 80/100.
○ 0.8 or 80% accuracy.
Model Evaluation
● Accuracy
○ Accuracy is useful when target classes are
well balanced
○ In our example, we would have roughly the
same amount of cat images as we have dog
images.
Model Evaluation
● Accuracy
○ Accuracy is not a good choice with
unbalanced classes!
○ Imagine we had 99 images of dogs and 1
image of a cat.
○ If our model was simply a line that always
predicted dog we would get 99% accuracy!
Model Evaluation
● Accuracy
○ Imagine we had 99 images of dogs and 1
image of a cat.
○ If our model was simply a line that always
predicted dog we would get 99% accuracy!
○ In this situation we’ll want to understand recall
and precision
Model Evaluation
● Recall/Sensitivity
○ TP/(TP+FN)
○ Ability of a model to find all the relevant cases within a
dataset.
○ The precise definition of recall is the number of true
positives divided by the number of true positives plus
the number of false negatives.
Model Evaluation
● Assume, a data set of 100 possible cancer patients
whose only 10 patients have really cancer
● Recall: TP/(TP+FN)
● Let, our model identified 5 patients have cancer. So,
True Positive = 5, False Negative = 5
● Recall = 5/10 = .5 = 50 %
● Recall works on false negative- finds the number of
unidentified real cancer patients
57
Model Evaluation
● Precision
○ Ability of a classification model to identify only
the relevant data points.
○ Precision is defined as the number of true
positives divided by the number of true
positives plus the number of false positives.
Model Evaluation
● Assume, a data set of 100 possible cancer patients
whose only 5 patients have really cancer
● Precision: TP/(TP+FP)
● Let, our model predicted all100 patients have
cancer. So, True Positive + False Positive = 100
● But only 5 patients have cancer. True Positive = 5
● Precision = 5/100 = .05 = 5%
● Precision works on false positive- finds the number
of real cancer patients
59
Model Evaluation:
Recall and
Precision
● F1-Score
○ In cases where we want to find an optimal
blend of precision and recall we can combine
the two metrics using what is called the F1
score.
Model Evaluation
● F1-Score
○ The F1 score is the harmonic mean of
precision and recall taking both metrics into
account in the following equation:
Model Evaluation
● F1-Score
○ We use the harmonic mean instead of a
simple average because it punishes extreme
values.
○ A classifier with a precision of 1.0 and a recall
of 0.0 has a simple average of 0.5 but an F1
score of 0.
○
Model Evaluation
65
Evaluating Performance
REGRESSION
Evaluating Regression
● Let’s take a moment now to discuss evaluating
Regression Models
● Regression is a task when a model attempts
to predict continuous values (unlike
categorical values, which is classification)
Evaluating Regression
● You may have heard of some evaluation
metrics like accuracy or recall.
● These sort of metrics aren’t useful for
regression problems, we need metrics
designed for continuous values!
Evaluating Regression
● For example, attempting to predict the price of
a house given its features is a regression
task.
● Attempting to predict the country a house is in
given its features would be a classification
task.
Evaluating Regression
● Most common evaluation metrics:
○ Mean Absolute Error (MAE)
○ Mean Squared Error (MSE)
○ Root Mean Square Error (RMSE)
Evaluating
Regression
Mean Absolute Error (MAE)
• This is the mean of
the absolute value of
errors.
• Easy to understand
Evaluating Regression
● MAE won’t punish large errors however.
Evaluating Regression
76
• Root Mean Square Error (RMSE)
Evaluating • This is the root of the mean of the
Regression squared errors.
• Most popular (has same units as y)
Machine Learning
● Clustering
○ Grouping together unlabeled data points
into categories/clusters
○ Data points are assigned to a cluster based
on similarity
Machine Learning
● Anomaly Detection
○ Attempts to detect outliers in a dataset
○ For example, fraudulent transactions on a
credit card.
Machine Learning
● Dimensionality Reduction
○ Data processing techniques that reduces
the number of features in a data set, either
for compression, or to better understand
underlying trends within a data set.
Machine Learning
● Unsupervised Learning
○ It’s important to note, these are situations
where we don’t have the correct answer
for historical data!
○ Which means evaluation is much harder
and more nuanced!
Unsupervised Process
Test
Data
Model
Data Data Training & Transformation Model
Acquisition Cleaning Building Deployment