[go: up one dir, main page]

0% found this document useful (0 votes)
11 views53 pages

Unit-1 - Session 1 - Supervised & Unsupervised PDF

The document provides an overview of machine learning, including its types such as supervised, unsupervised, and reinforcement learning, and their applications in various fields like computer vision and pattern recognition. It discusses the importance of probability theory in machine learning, the process of training models, and the evaluation of model performance using metrics like accuracy, precision, and recall. Additionally, it highlights real-world applications such as email classification, image recognition, and regression tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views53 pages

Unit-1 - Session 1 - Supervised & Unsupervised PDF

The document provides an overview of machine learning, including its types such as supervised, unsupervised, and reinforcement learning, and their applications in various fields like computer vision and pattern recognition. It discusses the importance of probability theory in machine learning, the process of training models, and the evaluation of model performance using metrics like accuracy, precision, and recall. Additionally, it highlights real-world applications such as email classification, image recognition, and regression tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

21CSC305P- MACHINE

LEARNING
2
3
UNIT-I

Machine learning- What and Why, Supervised Learning,


Unsupervised learning, Polynomial curve fitting , Probability theory-
discrete random variables, Fundamental rules, Bayes rule,
Independence and conditional independence , Continuous random
variables, Quantiles, mean and variance, Probability densities,
Expectation and covariance

4
Machine learning- What and Why
• The rise of big data demands machine learning for efficient data analysis and
decision-making.
• For instance, there are around 1 trillion web pages, and every second, one hour of
video content is uploaded to YouTube, equating to 10 years of content every day.
Additionally, thousands of human genomes, each consisting of approximately 3.8
billion base pairs, have been sequenced, and Walmart handles over 1 million
transactions per hour, resulting in databases containing more than 2.5 petabytes of
information.
• Machine learning comprises techniques that can automatically detect patterns within
data and leverage these patterns to predict future data or make decisions under
uncertainty.
• The optimal approach to addressing such challenges is through probability theory,
which applies to any problem involving uncertainty.

5
Artificial Intelligence (AI)
• Programs with the ability to learn
and reason like humans

Machine Learning (ML)


• Algorithms with the ability to learn
without being explicitly programmed
• (subset of AI)

Deep Learning (DL)


• Artificial Neural Networks adapt and
learn from vast amount of data.
• (subset of ML)
6
Human perception is almost similar to the Machine perception

• How did we learn the alphabet !!

➢ We can recognize all the alphabet by training or learning ourselves i.e.,


because of this training we can recognize alphabets. Suppose, any new
alphabet is coming and based on our intelligence we can recognize it.

• How about providing such capabilities to the Machines to recognize


alphabets !!

➢ We have to go for training of the system (i.e., ML system or


pattern recognition system). After the training, with the help of the
trained model, it can be possible to recognize alphabets.

Learning is the essential step. After training or learning we have to go for testing. 7
Pattern Recognition

• It is a process of recognizing patterns by using ML


Sensors Patterns
algorithms. This approach is derivative of ML that uses
data analysis to identify patterns.
Features
• Applications: Generation
✓ Optical Character Recognition: Handwritten, Printed text
✓ Biometrics: Face, Finger prints, Speech Recognition
✓ Diagnostic Systems: X-Ray, ECG Analysis, etc. Feature
✓ Military Domain: Automated Target Recognition, Aerial Extraction
or Satellite Image Segmentation
Classifier System
Design Evaluation

Neural Network is a soft computing based Pattern Recognition.


8
Computer Vision

• It is a field of computer science that works on enabling computers to see,


identify, and process image data (i.e., images) in the same way that
human vision does and then provide appropriate output.

Pattern Recognition
Camera Image Processing
(AI/ML Algorithm)
(Image Acquisition) (Pre-Processing)
Decision Making

• For computer vision, the input is images or videos. Therefore, to capture the
image one may need image acquisition devices like camera. After that, some
pre-processing techniques are important. Finally, we need to apply ML
algorithms for decision making purpose. 10
Types of Machine Learning

11
• Supervised: Learning with a labelled training set or dataset with proper label info
➢ (e.g., Email classification, House Rent Price Prediction etc.)

• Unsupervised: Discover patterns in unlabelled data so that no label info is given


➢ (e.g., Cluster similar documents based on text)

• Reinforcement Learning: Learn to act based on feedback / reward.


➢ (e.g., Learn to play go, reward: Win or Lose) i.e., corresponding to a particular
good action, I will be getting a reward.
Here, one particular action is not important. The group of actions are more
important.

12
Semi Supervised Learning

• This approach is desirable for small amount of labelled data and large
amount of unlabelled data.

• It is a class of supervised learning tasks and techniques that also make


use of unlabelled data for training.

• This approach falls between unsupervised learning and supervised


learning.

❑ Example:

For medical image classification, it is very much difficult to get the labelled
data. In such case, we may have some amount of labelled data and large
amount of unlabelled data. 13
Machine Learning

Supervised Learning Unsupervised Learning Reinforcement Learning

Regression Clustering
• Linear • PCA Continuous Variable
• Polynomial • K-means

Association
Classification • Apriori Categorical Variable
• FP-Growth

14
Predictive or Supervised Learning:
•Goal: Learn a mapping from inputs x to outputs y, given a labeled set of input-output pairs
•Training set D: Set of input-output pairs and N: Number of training examples.
•Training input xi:
• Typically a D-dimensional vector of numbers.
• Represents features, attributes, or covariates (e.g., height and weight of a person).
• Can be complex structured objects (e.g., images, sentences, time series, molecular shapes, graphs).
•Output or response variable yi​:
• Can be categorical/nominal (e.g., male or female) or real-valued (e.g., income level).
• Categorical problems are known as classification or pattern recognition.
• Real-valued problems are known as regression.
• Ordinal regression: Label space Y has a natural ordering (e.g., grades A-F).
Descriptive or Unsupervised Learning:
•Goal: Find "interesting patterns" in the data.
•Given inputs . Also known as knowledge discovery.
•No well-defined problem as patterns are not specified in advance.
Reinforcement Learning:
•Useful for learning how to act or behave when given occasional reward or punishment signals.
•Example: How a baby learns to walk.
15
Supervised learning
1. Classification:
• Goal of Classification: Learn a mapping from inputs x to outputs y,
where y∈{1,…,C} with C being the number of classes.
• Binary Classification: When C=2, known as binary classification (e.g.,
y∈{0,1})
• Multiclass Classification: When C>2, known as multiclass classification.
• Multi-label Classification: When class labels are not mutually exclusive
(e.g., someone classified as tall and strong), predicting multiple related
binary class labels (multiple output model).
• One way to formalize the problem is as function approximation.
Assume y=f(x) for an unknown function f; learning aims to estimate f
using a labeled training set and predict with
• Generalization: The main goal is to make predictions on novel inputs not
seen before, emphasizing the importance of generalization over fitting the
training set.
16
Supervised Learning

Car

Bus Supervised Predictive


Predictive
Learning Model
Model
Flight
Car

Ship

17
Supervised Learning

Dhoni

Virat
Supervised Predictive
Predictive
Learning Model
Model
Sunil
Dhoni

Anand

18
Classification Analysis

19
Multi-class Classification: Emotion Analysis

Sl. No. Emotion Class Label

1. Anger 0

2. Disgust 1

3. Fear 2

4. Happiness 3

5. Neutral 4

6. Sadness 5

20
Supervised learning-Cont.
Example
• Two classes of objects with labels 0 and 1.
• Inputs are colored shapes, described by D features or attributes.
• Features are stored in an N×D design matrix.
• Input features x can be discrete, continuous, or both. Vector of training labels y.
• Test objects: blue crescent, yellow circle, and blue arrow.
• These test objects have not been seen before, requiring generalization beyond the training set.
Generalization:
•Blue crescent likely has y=1 since all blue shapes in the training set are labeled 1.
•Yellow circle's label is unclear due to mixed labels for yellow objects and circles.
•Blue arrow's label is also unclear due to lack of specific information from the training set.

21
Supervised learning-Cont.

The need for probabilistic predictions:


• In classification, ambiguous cases should be handled by returning a probability
distribution over possible labels given the input and training set, denoted by p(y∣x,D)
•Compute best guess as the most probable class label using

•This is the mode of the distribution p(y∣x,D) and known as a MAP estimate (maximum a
posteriori).
• Confidence in predictions is crucial, especially in risk-averse domains like medicine
and finance.
• IBM's Watson for Jeopardy uses a confidence module to decide when to answer.
• Google's SmartASS (ad selection system) predicts the click-through rate (CTR) to
maximize expected profit.
• Systems like Watson and SmartASS assess the risk of their predictions, making
decisions based on confidence levels to optimize performance and minimize errors.

22
Supervised learning-Cont.

Real-world applications:
(i) Document classification and email spam filtering
• In document classification, the primary objective is to categorize documents like web
pages or email messages into predefined classes C, determining p(y=c∣x,D), where x
represents the document's text representation.
• A classic example is email spam filtering, where classes are typically labeled as spam (
y=1 ) or non-spam ( y=0).
• Most classifiers assume a fixed-size input vector x. To handle variable-length documents,
a common approach is the bag of words (BoW) representation.
• Bag of Words (BoW):
• Documents are transformed into fixed-size feature vectors.
• Each vector element corresponds to a word from a predefined vocabulary.
• If a word appears in the document, its corresponding vector element is set to 1; otherwise,
it remains 0.

23
Supervised learning-Cont.
(ii) Classifying flowers
• The goal is to classify iris flowers into three types: setosa, versicolor, and virginica,
based on four extracted features: sepal length, sepal width, petal length, and petal
width.

24
Supervised learning-Cont.

(iii) Image classification and handwriting recognition


• Image classification involves categorizing images
based on their content, such as indoor vs. outdoor
scenes, orientation (horizontal vs. vertical), or
presence of specific objects like dogs.
• MNIST (which stands for “Modified National
Institute of Standards” )is a widely used dataset for
handwritten digit recognition, containing 60,000
training images and 10,000 test images of digits (0-9).

Image pixels of Human Eye


• Each image is grayscale, sized 28x28 pixels, and
represents handwritten digits by various individuals.
• Images are represented as feature vectors, where each
pixel's grayscale value (ranging from 0 to 255) serves
as a feature.

25
Supervised learning-Cont.
(iv) Face detection and recognition
• Object detection, or localization, involves
identifying specific objects within an image. A
notable application is face detection, which is
crucial for tasks like autofocus in cameras and
privacy features in services like Google's
StreetView.
• One approach to face detection is the sliding
window detector method. It divides the image into
small overlapping patches at various locations,
scales, and orientations.
• Each patch is classified based on whether it
exhibits face-like textures or features. Locations
where the probability of containing a face is high
are identified as potential face locations.
• Modern digital cameras often integrate face
detection systems to assist with autofocus by
identifying and focusing on faces within the frame.
• Services like Google's StreetView use face
detection to automatically blur faces to protect
privacy. 26
Supervised learning-Cont.

2. Regression:
• Regression is just like classification except the response variable is continuous

Here are some examples of real-world regression problems.


• Predict tomorrow’s stock market price given current market conditions and other possible side information.
• Predict the age of a viewer watching a given video on YouTube.
• Predict the location in 3d space of a robot arm end effector, given control signals (torques) sent to its various
motors.
• Predict the amount of prostate specific antigen (PSA) in the body as a function of a number of different clinical
measurements.
• Predict the temperature at any location inside a building using weather data, time, door sensors, etc. 27
Performance Analysis of ML models: Classification and Regression

28
Performance Analysis of ML models: Classification
E.g., for binary classification, a model predicts two
classes: ‘spam’ and ‘not_spam’ for inbox mail
• TP: How many times the spam is
Prediction recognized as spam
spam not_spam • FN: How many times the spam is
spam True Positive False Negative recognized as not spam
Actual

(TP) (FN) • FP: How many times the not spam is


not_spam False Positive True Negative recognized as spam
(FP) (TN) • TN: How many times the not spam is
recognized as not spam

• True Positive: Number of correct matches


• False Negative: Matches that are not correctly detected
• False Positive: Matches that are incorrect
• True Negative: Non-Matches that are correctly rejected

29
Performance Analysis of ML models: Classification (cont.)

30
• Train the model using learning algorithm using a training set then validate using
a validation set (i.e., testing set)

• During training, we have to perform hyperparameter tuning for tuning the ML


model

• Try to avoid overfitting or underfitting issues

• Finally, we need to evaluate the performance of the model with the help of
testing samples

31
• During testing with the help of unseen data, the error will be high. Because it can
not handle unseen data. We can represent perfectly the training samples with the
help of complex models. This corresponds to the case of overfitting.

32
Performance Analysis of ML models: Classification (cont.)

• Accuracy: Accuracy is given by the number of correctly classified examples divided by


the total number of classified examples.
𝑇𝑃+𝑇𝑁
• Accuracy =
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

• Precision: It is the ratio of the correct positive predictions to the overall number of
positive prediction.
𝑇𝑃
• Precision =
𝑇𝑃+𝐹𝑃
• Recall: It is the ratio of correct positive predictions to the overall number of positive
examples.
𝑇𝑃
• Recall =
𝑇𝑃+𝐹𝑁
• F1 Score: It is the harmonic mean of Precision and Recall. A perfect model has an F1
score of 1.
2 ∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗𝑅𝑒𝑐𝑎𝑙𝑙
• F1 Score =
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
33
Performance Analysis of ML models: Classification (cont.)
# Precision and Recall depends on the specific problem
(2) Problem is related to the detection of an
(1) Problem is related to the diagnosis of cancer email is spam or not spam
Prediction Prediction
Cancer not_cancer spam not_spam
cancer True Positive False Negative spam True Positive False Negative
(TP) (FN) (TP) (FN)
Actual

Actual
(Perfect) (High Risk) (Perfect) Low Risk (?)
False Positive True Negative False Positive True Negative
not_cancer (FP) (TN) not_spam (FP) (TN)

Low Risk (?) (Perfect) (High Risk) (Perfect)

• In such cases, the models rise a false • It is more important that we don’t miss any
alarm but the actual positive cases important email as spam than receiving an
should not go undetected. occasional spam as no spam.
• (Recall is important) • (Precision is important) 34
Performance Analysis of ML models: Classification (cont.)
Calculate: Accuracy, Precision, Recall, F1-Score

Prediction 𝑇𝑃+𝑇𝑁
Accuracy =
1 0 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
1 True Positive False Negative 𝑇𝑃
Precision =
Actual

(TP = 6) (FN = 2) 𝑇𝑃+𝐹𝑃


𝑇𝑃
False Positive True Negative Recall =
0 (FP = 1) (TN = 3) 𝑇𝑃+𝐹𝑁
2 ∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗𝑅𝑒𝑐𝑎𝑙𝑙
F1 Score =
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙

• How many times a particular class is Answer:


recognized and how many times a particular
class is not recognized – based upon this we Accuracy = 0.75
can determine the parameters of confusion Precision = 0.85
matrix Recall = 0.75
F1 Score = 0.79
35
Performance Analysis of ML models: Classification (cont.)
# Consider the India vs England Test Cricket Tournament. Find the mapping through confusion matrix

Prediction
• Condition 1: True Positive
India England
• Condition 2: True Negative
India True Positive False Negative
• Condition 3: False Positive
Actual

(TP) (FN)
• Condition 4: False Negative
False Positive True Negative
England (FP) (TN)

• Condition 1: You had predicted that India would win and it won
• Condition 2: You had predicted that England would not win and it lost
• Condition 3: You had predicted that England would win, but lost
• Condition 4: You had predicted that India would not win, but it won

36
Performance Analysis of ML models: Classification (cont.)
# Consider the Actual and Predicted classification result. Find the mapping through confusion matrix

Actual 1 1 1 1 1 1 1 1 0 0 0 0
Predicted 0 0 1 1 1 1 1 1 1 0 0 0
Result ? ? ? ? ? ? ? ? ? ? ? ?

Prediction
1 0
1 True Positive False Negative
Actual

(TP = ?) (FN = ?)
False Positive True Negative
0 (FP = ?) (TN = ?)

37
Performance Analysis of ML models: Classification (cont.)
# Consider the Actual and Predicted classification result. Find the mapping through confusion matrix

Actual 1 1 1 1 1 1 1 1 0 0 0 0
Predicted 0 0 1 1 1 1 1 1 1 0 0 0
Result FN FN TP TP TP TP TP TP FP TN TN TN

Prediction
1 0
1 True Positive False Negative
Actual

(TP = 6) (FN = 2)
False Positive True Negative
0 (FP = 1) (TN = 3)

38
Performance Analysis of ML models: Classification (cont.)

# Multi-class classification (e.g., Emotion Analysis). Find the mapping through confusion matrix

Predicted
Anger Disgust Fear Happiness Neutral Sadness
Anger TP FN FN FN FN FN
Disgust FP TP
Actual

Fear FP TP
Happiness FP TP
Neutral FP TP
Sadness FP TP
39
Performance Analysis of ML models: Classification (cont.)
Area under the ROC Curve (AUC):

• The ROC (Receiver Operating Characteristic) is a commonly used method to assess the
performance of binary classification models

• It uses the combination of TPR (the proportion of positive examples predicted correctly,
defined exactly as Recall) and FPR (the proportion of negative examples predicted
incorrectly)

𝑻𝑷 𝑭𝑷
• TPR = and FPR =
𝑻𝑷+𝑭𝑵 𝑭𝑷+𝑻𝑵

• To compare different classifier, it can be useful to summarize the performance of each


classifier into a single measure.

• Another parameter called Specificity = 1 - FPR 40


Performance Analysis of ML models: Classification (cont.)
Calculate: TPR, FPR

Prediction Prediction
spam not_spam spam not_spam
spam True Positive False Negative spam True Positive False Negative
Actual

Actual
(TP = 10) (FN = 0) (TP = 0) (FN = 10)
not_spam False Positive True Negative not_spam False Positive True Negative
(FP = 10) (TN = 0) (FP = 0) (TN = 10)

(All predictions say ‘spam’) (All predictions say ‘not_spam’)

41
Performance Analysis of ML models: Regression
• The R2 defines the distribution of variance in dependent variable which is predicted through independent variable. The R2 result lies in a range of 0 to 1.

2
∑ y j −yഥ j
The R2 value is estimated as: 𝑹𝟐 = 1 − 2
∑ y j −yෝ j

• The Mean Absolute Error (MAE) estimator measures the absolute average distance amongst the estimated data in addition to the predicted data. The

n
1
MAE is estimated as: 𝐌𝐀𝐄 = ෍ y j − yഥ j
n j=1

• The Mean Squared Error (MSE) is a measure used to estimate the average squared difference between the predicted responses and the actual responses.

n
1 2
The MSE is estimated as: 𝐌𝐒𝐄 = ෎ yഥ j − y j
n
j=1

• The Root Mean Squared Error (RMSE) estimator is expressed as the square root of the average of the squared difference amongst the estimated data
and the predicted data. The RMSE with lower range confirms the better predictive performance of the specific model.

n
1 2
The RMSE is measured as: 𝐑𝐌𝐒𝐄 = ෎ yഥ j − y j
n 42
j=1
Unsupervised Learning:

• The system forms clusters or natural groupings of input patterns.

• Organizing data into classes such that there is high intra-class similarity and low
inter-class similarity
• Finding the class labels and the number of classes directly from the data

• Clustering is basically unsupervised learning approach as no class values denoting a


priori grouping of data instances

43
Classification of Clustering of
Supervised data Unsupervised data

Clustering of Unsupervised data Clustering of Unsupervised data 44


Unsupervised learning
• The goal is to discover “interesting structure” in the data; this is sometimes called
knowledge discovery.
• Unsupervised learning formalizes tasks as density estimation, aiming to model the
probability distribution p(xi∣θ) of input data xi given parameters θ.Unlike supervised
learning, where the focus is on predicting yi given xi and θ, unsupervised learning
directly estimates the density p(xi∣θ)
• Supervised learning involves conditional density estimation p(yi∣xi,θ), where yi is the
target variable. In contrast, unsupervised learning focuses on unconditional density
estimation p(xi∣θ), where xi represents feature vectors.
• In unsupervised learning, xi is typically a vector of features, necessitating the creation
of multivariate probability models to capture dependencies between different features.
• Supervised learning often uses simpler univariate probability models with input-
dependent parameters, focusing on predicting a single variable yi. This simplification is
not applicable in unsupervised settings due to the absence of labeled output.
• It is more widely applicable than supervised learning since it does not require costly
and often sparse labeled data, making it feasible for modeling complex systems where 45
labeled data is limited or unavailable.
Unsupervised learning-Cont.

1. Discovering clusters:

•Clustering involves grouping data points into clusters based on similarities in their
features, without predefined labels.
•The goal is to estimate the distribution p(K∣D) over the number of clusters K, indicating
the presence of subgroups within the data.
•Model selection in clustering aims to determine the optimal number of clusters K∗ often
approximated by the mode of p(K∣D). Unlike supervised learning where classes are
predefined, unsupervised learning allows flexibility in choosing the number of clusters that
best represent the underlying structure of the data.
•Each data point i is assigned to a cluster zi∈{1,…,K} based on the probability p(zi=k∣xi,D),
where xi is the feature vector of the data point.

•Assignments zi∗ are inferred to determine the cluster membership of each data point,
illustrated by different colors representing clusters in visualizations.
46
Unsupervised learning-Cont.

Applications of Clustering:
•Astronomy: Clustering methods like Autoclass have been used to discover new
types of stars based on astrophysical measurements.
•E-commerce: Clustering users based on purchasing or web-surfing behavior
allows for targeted advertising and personalized recommendations.
•Biology: Clustering flow-cytometry data helps identify different sub-populations of
cells, aiding in biological research such as understanding disease mechanisms. 47
Unsupervised learning-Cont.
2. Discovering latent factors:
• Dimensionality reduction involves projecting high-dimensional data into a lower-
dimensional subspace that captures essential characteristics of the data.
• Despite high-dimensional appearances, data often exhibit variability across a smaller
number of latent factors. Dimensionality reduction helps in focusing on these key
factors, such as lighting, pose, or identity in face image modeling.
• PCA is a common approach for dimensionality reduction, resembling an unsupervised
form of multi-output linear regression.
• Given high-dimensional responses y, PCA infers latent low-dimensional factors z that
explain most of the variability in y.

48
Unsupervised learning-Cont.

Applications:

• In biology, it is common to use PCA to interpret gene microarray data, to account for the
fact that each measurement is usually the result of many genes which are correlated in
their behavior by the fact that they belong to different biological pathways.
• In natural language processing, it is common to use a variant of PCA called latent
semantic analysis for document retrieval.
• In signal processing (e.g., of acoustic or neural signals), it is common to use ICA (which
is a variant of PCA) to separate signals into their different sources.
• In computer graphics, it is common to project motion capture data to a low dimensional
space, and use it to create animations.

49
Unsupervised learning-Cont.
3. Discovering graph structure
• Learning sparse graphical models involves representing relationships between correlated
variables using a graph G, where nodes depict variables and edges denote direct
dependencies. This approach is pivotal in both discovering new knowledge and enhancing
joint probability density estimators.
• In systems biology, sparse graphical models are used to uncover relationships among
biological entities. For instance, graphs derived from protein phosphorylation data reveal
complex interactions within cellular networks. Similarly, neural wiring diagrams in birds can
be reconstructed from EEG data, highlighting functional connectivity patterns.
• In fields like financial portfolio management, sparse graphs help model covariance between
stocks for better prediction and decision-making. Utilizing sparse graph structures has proven
beneficial in outperforming traditional methods, enabling more effective trading strategies.
• Applications extend to traffic prediction systems, such as JamBayes, which leverage learned
graphical models to forecast traffic flow dynamics. These models contribute to accurate
predictions and efficient management of transportation networks, illustrating the broad
applicability and utility of sparse graphical learning in real-world scenarios.
50
Unsupervised learning-Cont.

51
Unsupervised learning-Cont.
4. Matrix completion
• Sometimes we have missing data, that is, variables whose values are unknown. For example, we might have
conducted a survey, and some people might not have answered certain questions.
• The corresponding design matrix will then have “holes” in it; these missing entries are often represented by
NaN, which stands for “not a number”. The goal of imputation is to infer plausible values for the missing
entries. This is sometimes called matrix completion.
• Image Inpainting: Technique to fill in missing parts of images due to scratches or occlusions, achieved by
modeling joint probability of pixels from clean images.
• Collaborative Filtering: Predicting user preferences for items (like movies) based on sparse ratings matrices,
aiming to fill in missing ratings for better recommendation systems.
• Market basket analysis:
❖ Involves examining a large, sparse binary matrix where columns represent items/products and rows
represent transactions.
❖ Each entry in the matrix indicates whether an item was purchased in a specific transaction. By analyzing
correlations among items often bought together, predictions can be made about additional items a consumer
might buy based on partial transaction data.
❖ This technique is also applicable in other domains, such as predicting file dependencies in software systems.
❖ Common methods for market basket analysis include frequent itemset mining, which generates association
rules, and probabilistic modeling, which fits a joint density model to the data.
❖ Data mining emphasizes interpretability of models, whereas machine learning focuses on model accuracy.52
Imbalanced Dataset – Challenges and Solutions

53
54

You might also like