Machine
Learning
Prepared By:
Dr. Sara Sweidan
The origins of machine learning
Machine Learning AIx11 2
The origins of machine learning
Machine Learning AIx11 3
The origins of machine learning
Machine Learning AIx11 4
The origins of
machine
learning
• Machine learning is known as the
development of computer
algorithms to transform data into
an intelligent action
Machine Learning AIx11 5
machine learning basics
• Machine Learning is a branch of artificial intelligence concerned with developing algorithms
that allow a computer to learn automatically from past data and experiences.
• Machine learning builds models to make predictions using historical data or information.
• Historical data: Known as training data.
• Whenever it receives new data, it predicts its output.
• The accuracy of predictions depends on the amount of data: More data → better predictions.
Machine Learning AIx11 6
machine learning basics
Why Make Machines Learn?
• Lack of sufficient human expertise in a domain (e.g., simulating navigations in unknown
territories or even spatial planets).
• Scenarios and behavior can keep changing over time (e.g., availability of infrastructure in
an organization, network connectivity, and so on).
• Humans have sufficient expertise in the domain but it is extremely difficult to formally
explain or translate this expertise into computational tasks (e.g., speech recognition,
translation, scene recognition, cognitive tasks, and so on).
• Addressing domain specific problems at scale with huge volumes of data with too many
complex conditions and constraints.
Machine Learning AIx11 7
Machine learning features
• Learning-based agent.
• It can learn from past data
and improve automatically.
• Machine learning can deal
with huge amounts of data.
Machine Learning AIx11 8
machine learning basics
Formal Definition
The idea of Machine Learning is that there will be some
learning algorithm that will help the machine
learn from data. Professor Mitchell defined it as follows.
“A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P, improves
with experience E.”
Machine Learning AIx11 9
Defining task T Defining experience E Defining performance P
Classification: classify animal The process of consuming a dataset performance measures include
images into dogs and cats.
Regression: predicting house price. that consists of data samples or data accuracy, precision, recall, F1
Anomaly detection: indication of fraud. points such that a learning algorithm score, sensitivity, specificity,
Structured annotation: text
annotation like grammar, sentiment, or model learns inherent patterns is error rate, misclassification rate.
named entities, image annotation like defined as the experience, E which Performance measures are usually
annotate specific areas of images.
Translation: natural language is gained by the learning algorithm. evaluated on training data samples
translate. (used by the algorithm to gain
Clustering or grouping: group
similar products, events, entities experience, E) as well as data
Transcription: These tasks usually samples which it has not seen or
entail various representations of data
that are learned from before, which are usually
usually continuous and unstructured known as validation and test data
and converting them into more
structured samples.
and discrete data elements. Examples The idea behind this is to generalize
include speech to text, optical
character the algorithm so that it doesn’t
recognition, images to text, Machine Learning AIx11 become too biased only 10 on the
A Multi-Disciplinary Field
Machine Learning AIx11 11
major facets under the AI umbrella
Machine Learning AIx11 12
Machine learning successes
A survey of recent success stories includes several prominent applications:
• Identification of unwanted spam messages in e-mail
• Segmentation of customer behavior for targeted advertising
• Forecasts of weather behavior and long-term climate changes
• Reduction of fraudulent credit card transactions
• Actuarial estimates of financial damage of storms and natural disasters
• Prediction of popular election outcomes
• Development of algorithms for auto-piloting drones and self-driving cars
• Optimization of energy use in homes and office buildings
• Projection of areas where criminal activity is most likely
• Discovery of genetic sequences linked to diseases
Machine Learning AIx11 13
Machine learning challenges
• Data quality issues lead to problems, especially with regard to data processing and
feature extraction.
• Data acquisition, extraction, and retrieval is an extremely tedious and time-consuming
process.
• Lack of good quality and sufficient training data in many scenarios.
• Formulating business problems clearly with well-defined goals and objectives.
• Feature extraction and engineering, especially hand-crafting features, is one of the
most difficult yet important tasks in Machine Learning.
• Overfitting or underfitting models can lead to the model learning poor representations
and relationships from the training data leading to detrimental performance.
• Choice of correct Statistical Model that fits the data.
• Complex models can be difficult to deploy in the real world.
Machine Learning AIx11 14
Machine learning limitations
• It has very little flexibility to extrapolate outside of the strict parameters it
learned and knows no common sense.
• Should be extremely careful to recognize exactly what the algorithm has
learned before setting it loose in real-world settings.
• Without a lifetime of past experiences to build upon, computers are also limited
in their ability to make simple common sense inferences about logical next steps.
Machine Learning AIx11 15
How machines learn
How machines learn
• Data storage utilizes observation, memory, and recall to provide a factual basis for
further reasoning.
• Abstraction involves the translation of stored data into broader representations and
concepts.
• Generalization uses abstracted data to create knowledge and inferences that drive
action in new contexts.
• Evaluation provides a feedback mechanism to measure the utility of learned knowledge
and inform potential improvements.
Machine Learning AIx11 16
Data S Data Storage Torage
In a human being, this
consists of a brain that uses
electrochemical signals in a
All learning must begin with Computers have similar
network of biological cells to
data. capabilities of short- and long-
store and process
term recall using hard disk
Humans and computers alike observations for short- and
drives, flash memory, and
utilize data storage as a long-term future recall.
random-access memory (RAM)
foundation for more in combination with a central
advanced reasoning. processing unit (CPU).
Machine Learning AIx11 17
Abstraction
Abstraction
• This work of assigning meaning to stored data occurs during the abstraction process,
in which raw data comes to have a more abstract meaning. This type of connection,
say between an object and its representation
• During a machine's process of knowledge representation, the computer summarizes
stored raw data using a model, an explicit description of the patterns within the data.
• There are many different types of models. You may be already familiar with some.
Examples include:
• Mathematical equations
• Relational diagrams such as trees and graphs
• Logical if/else rules
• Groupings of data known as clusters
Machine Learning AIx11 18
Abstraction
Abstraction
• The process of fitting a model to a dataset is known as training. When the model has
been trained, the data is transformed into an abstract form that summarizes the original
information.
➢Note, You might wonder why this step is called training rather than learning.
First, note that the process of learning does not end with data abstraction; the learner
must still generalize and evaluate its training.
Second, the word training better connotes the fact that the human teacher trains the
machine student to understand the data in a specific way.
Machine Learning AIx11 19
generalization
• The term generalization describes the process of turning abstracted knowledge into a
Generalization
form that can be utilized for future action, on tasks that are similar, but not identical, to
those it has seen before.
• In generalization, the learner is tasked with limiting the patterns it discovers to only
those that will be most relevant to its future tasks.
• The algorithm is said to have a bias if the conclusions are systematically erroneous, or
wrong in a predictable manner
Machine Learning AIx11 20
Evaluation
• Bias is a necessary evil associated with the abstraction and generalization processes
inherent in any learning task.
• Therefore, the final step in the generalization process is to evaluate or measure the
learner's success in spite of its biases and use this information to inform additional
training if needed.
• Generally, evaluation occurs after a model has been trained on an initial training
dataset. Then, the model is evaluated on a new test dataset to judge how well its
characterization of the training data generalizes to new, unseen data. It’s worth noting
that it is exceedingly rare for a model to generalize to every unforeseen case perfectly.
Machine Learning AIx11 21
Evaluation
In parts, models fail to perfectly generalize due to the problem of noise, Noisy data is
caused by seemingly random events, such as:
• • Measurement error due to imprecise sensors that sometimes add or subtract a
bit from the readings.
• • Issues with human subjects, such as survey respondents reporting random
answers to survey questions, in order to finish more quickly.
• • Data quality problems, including missing, null, truncated, incorrectly coded, or
corrupted values.
• • Phenomena that are so complex or so little understood that they impact the data
in ways that appear to be unsystematic.
Trying to model noise is the basis of a problem called overfitting.
Machine Learning AIx11 22
Underfitting &
overfitting
• Underfitting & overfitting
Machine Learning AIx11 23
Underfitting & overfitting
Training = 50% Training = 98% Training = 99%
Test = 48% Test = 92% Test = 60%
Bias = 50% (high bias) Bias = 2% (low bias) Bias = 1% (low bias)
Variance= 2 (low var.) Variance= 6 (low var.) Variance= 39 (high var.)
Underfitting model. fitting model. overfitting model.
Machine Learning AIx11 24
Bias & variance
• Bias
• measures accuracy or quality of the model
• low bias implies on average we will accurately estimate true parameter from
training data
• Variance
• Measures precision or specificity of the model
• Low variance implies the model does not change much as the training set varies
Machine Learning AIx11 25
underfit region
overfit region
Machine Learning AIx11 26
underfit region
overfit region
Machine Learning AIx11 27
Underfitting & overfitting
• Models with too few parameters are inaccurate because of a large bias
(not enough flexibility).
• Models with too many parameters are inaccurate because of a large variance
(too much sensitivity to the sample randomness).
Machine Learning AIx11 28
Regression: Complexity versus Goodness of Fit
Low Bias
Low Variance / / High Variance
High Bias
Highest Bias low Bias low Bias
Lowest variance low Variance High variance
Model complexity = low Model complexity = medium Model complexity = high
Machine Learning AIx11 29
10/3/19 Dr. Yanjun
Machine Qi / UVA AIx11
Learning CS 38 30
Fixes to try:
- Try getting more training examples → fixes high variance.
- Try a smaller set of features → fixes high variance.
- Try a larger set of features → fixes high bias.
Machine Learning AIx11 31
Machine Learning Methods
Machine Learning AIx11 32
Machine Learning Methods
1. Methods based on the amount of human supervision in the learning process
a. Supervised learning (predictive models)
b. Unsupervised learning (descriptive models)
c. Semi-supervised learning
d. Reinforcement learning
2. Methods based on the ability to learn from incremental data samples
a. Batch learning
b. Online learning
3. Methods based on their approach to generalization from data samples
a. Instance based learning
b. Model based learning
Machine Learning AIx11 33
a. Supervised learning (classification)
Machine Learning AIx11 34
Breast cancer (malignant, benign)
Classification
1(Y)
Discrete valued
Malignant?
output (0 or 1)
0(N)
Tumor Size
Tumor Size
Machine Learning AIx11 35
Supervised Learning(Classification)
Classification aims to identify group membership.
Input: {x1, x2,…, xn} categorical values, called features
Output: y categorical values, called Target Value
Ex.: data about computers (training data) Find a model to predict status of unseen
cases features TargetValue
Processor Memory Status
(GHz) (GB)
1.0 1.0 Bad
2.3 4.0 Good
2.6 4.0 Good
3.0 8.0 Good
2.0 4.0 Bad
2.6 0.5 Bad
➔ 3.0 4.0 ???Machine Learning AIx11 11 36
PredictiontasksinSupervisedLearning
Binary classification (e.g., email ⇒ spam/not spam):
classification: the label is a discrete variable
• e.g., the task of predicting the types of residence
x f y ∈ {0,1}
Regression (e.g., location, year ⇒ housing price):
regression: if 𝑦 is a continuous variable
• e.g., price prediction
x f y∈ R
Machine Learning AIx11 37
Supervised Learning
x2
x1
Machine Learning AIx11 38
b- UnsupervisedLearning
Unsupervised learning is a learning method in which a machine learns
without any supervision from data that has no labels.
• The algorithm needs to act on the data without any supervision.
• The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.
• In unsupervised learning, we don't have a predetermined result. The
machine tries to find useful insights from a huge amount of data.
Machine Learning AIx11 39 39
UnsupervisedLearning
• Training data contain only the input vectors (No labeled data)
• Definition of training data: {x1, x2,…, xn}
• Goal: Learn some structures in the inputs.
• Can be divided into two categories: Clustering and Dimensionality Reduction
Machine Learning AIx11 40 40
Unsupervised Learning
x2
x1
Machine Learning AIx11 41
unsupervised learning (clustering)
There are various types of clustering methods that can be classified under
the following major approaches.
• Centroid based methods such as K-means
• Hierarchical clustering methods
• Distribution based clustering methods such as Gaussian mixture models
• Density based methods such as dbscan and optics.
Machine Learning AIx11 42
unsupervised learning (dimensionality reduction )
• Feature selection methods
• Feature extraction methods
• These methods reduce the number of feature variables by extracting or selecting a
set of principal or representative features.
• There are multiple popular algorithms available for dimensionality reduction like
Principal Component Analysis (PCA), nearest neighbors, and discriminant analysis.
Machine Learning AIx11 43
unsupervised learning (anomaly detection )
Pattern discovery
• Unsupervised learning methods can be used for anomaly detection such that we
train the algorithm on the training dataset having normal, non-anomalous data
samples.
• Once it learns the necessary data representations, patterns, and relations among
attributes in normal samples, for any new data sample, it would be able to identify
it as anomalous or a normal data point by using its learned knowledge.
Machine Learning AIx11 44
unsupervised learning (Association Rule-Mining)
Association rules help in detecting and predicting transactional patterns
based on the knowledge it gains from training transactions.
Machine Learning AIx11 45
d. Reinforcement learning
Machine Learning AIx11 46
Instance-Based learning vs model-Based learning
instance-based learning model-based learning
The instance-based learning works by The model-based learning methods are a
looking at the input data points and using more traditional ML approach toward
a similarity metric to generalize and generalizing based on training data. Typically
predict for new data points. an iterative process takes place where the
A simple example would be a K-nearest input data is used to extract features and
neighbor algorithm models are built based on various model
parameters (known as hyperparameters).
These hyperparameters are optimized based
on various model validation techniques to
select the model that generalizes best on the
training data and some amount of validation
and test data (split from the initial dataset).
Machine Learning AIx11 47
Machine
learning in
practice
Machine Learning AIx11 48
Machine Learning AIx11 49
Machine learning in practice
• To apply the learning process to real-world tasks, we'll use a life-cycle
development project. any machine learning algorithm can be deployed by
following these steps:
1- data retrieval
2- data preparation
3- modeling
4- model evaluation and tuning
5- deployment
Machine Learning AIx11 50
Data retrieval
1- Data collection: collect all the necessary data needed for your business objective.
Such as: historical data warehouses, data marts, data lakes and so on.
2- Data description: Analysis the data to understand the nature of data
such as: Data sources (SQL, NoSQL, Big Data), Data volume (size, number of
records, total databases, tables), Data attributes and their description (variables,
data types), Relationship and mapping schemes (understand attribute
representations), Basic descriptive statistics (mean, median, variance), focuse on
most important attributes.
3- Exploratory data analysis: Explore, describe, and visualize data attributes
4- Data quality analysis: missing values, inconsistent values, wrong information and
metadata. Machine Learning AIx11 51
Data retrieval
Machine Learning AIx11 52
Data retrieval
Types of input data: Features also come in various forms.
• If a feature represents a characteristic measured in numbers, it is unsurprisingly called
numeric.
• Alternatively, if a feature is an attribute that consists of a set of categories, the feature is
called categorical or nominal.
• A special case of categorical variables is called ordinal, which designates a nominal
variable with categories falling in an ordered list.
• Some examples of ordinal variables include clothing sizes such as small, medium, and
large; or a measurement of customer satisfaction on a scale from "not at all happy" to
"very happy."
• It is important to consider what the features represent, as the type and number of
features in your dataset will assist in determining an appropriate machine learning
algorithm for your task.
Machine Learning AIx11 53
Data preparation
1. Data integration: is mainly done when we have multiple datasets that we might
want to integrate or merge.
2. Data wrangling: data pre-processing, cleaning, normalization, and formatting.
3. Attribute generation and selection: is basically selecting a subset of features
or attributes from the dataset based on parameters like attribute importance,
quality, relevancy, assumptions, and constraints.
Machine Learning AIx11 54
3-Modeling: In the process of modeling, we usually feed the data features to a Machine
Learning method or algorithm and train the model, typically to optimize a specific cost
function in most cases with the objective of reducing errors and generalizing the
representations learned from the data.
4-Model evaluation and tuning: Built models are evaluated and tested on validation
datasets and, based on metrics like accuracy, F1 score, and others, the model
performance is evaluated. Models have various parameters that are tuned in a process
called hyperparameter optimization to get models with the best and optimal results.
5- Deployment and monitoring: Selected models are deployed in production and are
constantly monitored based on their predictions and results.
Machine Learning AIx11 55
Standard machine learning pipeline
Machine Learning AIx11 56
Machine learning in practice
• Matching dataset to algorithms:
Model Learning task
Supervised learning algorithms
Nearest neighbor
Naïve bayes
Classification
Decision tree
Classification rule learner
Linear regression
Regression tree Numeric prediction
Model trees
Neural network Dual use
Support vector machine
Machine Learning AIx11 57
Machine learning in practice
• Matching input data to algorithms:
Model Learning task
Unsupervised learning algorithms
Association rules Pattern detection
K-means clustering Clustering
Machine Learning AIx11 58
“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T, as
measured by P, improves with experience E.”
Suppose your email program watches which emails you do or do
not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?
Classifying emails as spam or not spam.
Watching you label emails as spam or not spam.
The number (or fraction) of emails correctly classified as spam/not spam.
None of the above—this is not a machine learning problem.
“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T, as
measured by P, improves with experience E.”
Suppose your email program watches which emails you do or do
not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?
T Classifying emails as spam or not spam.
E Watching you label emails as spam or not spam.
P The number (or fraction) of emails correctly classified as spam/not spam.
None of the above—this is not a machine learning problem.
You’re running a company, and you want to develop learning algorithms to address each
of two problems.
Problem 1: You have a large inventory of identical items. You want to predict how many
of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for each
account decide if it has been hacked/compromised.
Should you treat these as classification or as regression problems?
Treat both as classification problems.
Treat problem 1 as a classification problem, problem 2 as a regression problem.
Treat problem 1 as a regression problem, problem 2 as a classification problem.
Treat both as regression problems.
You’re running a company, and you want to develop learning algorithms to address each
of two problems.
Problem 1: You have a large inventory of identical items. You want to predict how many
of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for each
account decide if it has been hacked/compromised.
Should you treat these as classification or as regression problems?
Treat both as classification problems.
Treat problem 1 as a classification problem, problem 2 as a regression problem.
Treat problem 1 as a regression problem, problem 2 as a classification problem.
Treat both as regression problems.
Of the following examples, which would you address using an
unsupervised learning algorithm? (Check all that apply.)
Given email labeled as spam/not spam, learn a spam filter.
Given a set of news articles found on the web, group them into
set of articles about the same story.
Given a database of customer data, automatically discover market
segments and group customers into different market segments.
Given a dataset of patients diagnosed as either having diabetes or
not, learn to classify new patients as having diabetes or not.
Of the following examples, which would you address using an
unsupervised learning algorithm? (Check all that apply.)
Given email labeled as spam/not spam, learn a spam filter.
Given a set of news articles found on the web, group them into
set of articles about the same story.
Given a database of customer data, automatically discover market
segments and group customers into different market segments.
Given a dataset of patients diagnosed as either having diabetes or
not, learn to classify new patients as having diabetes or not.
The k-NN algorithm
• The nearest neighbors approach to classification is exemplified by the k-nearest
neighbors algorithm (k-NN).
• K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
• The strengths and weaknesses of this algorithm are as follows:
Strengths Weaknesses
• Simple and effective. • the model is limited ability to understand how
the features are related to the class.
• Makes no assumptions about the • Requires selection of an appropriate k.
underlying data distribution. • Slow classification phase.
• Nominal features and missing data require
• Fast training phase. additional processing.
Machine Learning AIx11 65
The k-NN algorithm
• The k-NN algorithm gets its name from the fact that it uses information about an
example's k-nearest neighbors to classify unlabeled examples.
• The letter k is a variable term implying that any number of nearest neighbors could be
used.
• After choosing k, the algorithm requires a training dataset of examples classified into
several categories, as labeled by a nominal variable.
• Then, for each unlabeled record in the test dataset, k-NN identifies k records in the
training data that are the "nearest" in similarity.
• The unlabeled test instance is assigned the class of most of the k nearest neighbors.
Machine Learning AIx11 66
The k-NN algorithm
Machine Learning AIx11 67
The k-NN algorithm
The k-NN
algorithm
Machine Learning AIx11 68
The k-NN algorithm The k-NN algorithm
Machine Learning AIx11 69
The K-NN algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
• 𝑑𝑖𝑠𝑡 𝑝, 𝑞 = (𝑝1 − 𝑞1)2 + (𝑝2 − 𝑞2)2 + ⋯ … + (𝑝𝑛 − 𝑞𝑛 )2
• Step-4: Among these k neighbors, count the number of the data points in
each category.
• Step-5: Assign the new data points to that category for which the number
of the neighbor is maximum.
• Step-6: Our model is ready.
Machine Learning AIx11 70
• Example:
There is a Car manufacturer
company that has
manufactured a new SUV car.
The company wants to give the
ads to the users who are
interested in buying that SUV.
So for this problem, we have a
dataset that contains multiple
user's information through the
social network. , for this
problem, we have a dataset
that contains multiple users'
information through social
networks
Machine Learning AIx11 71
160000
140000
120000
100000
80000
60000
40000
20000
0
0 10 20 30 40 50 60
Machine Learning AIx11 72
160000 New data is classified to the
appropriate class
140000
120000
100000
80000
60000
40000
20000
0
0 10 20 30 40 50 60
Machine Learning AIx11 73
Training Set
Learning Algorithm
Testing Logical Predicated
Set model output
• In case of a very large value of k, we may include points from other classes in the
neighborhood.
• In case of too small value of k, the algorithm is very sensitive to noise
Machine Learning AIx11 74
The k-NN algorithm Parameters to tune
➢ k-NN algorithm can be used for imputing missing values of both categorical and
continuous variables.
➢ For numerical values, Euclidean distance is a good choice. You might want to
try Manhattan distance , which is sometimes used as well. For text analytics, cosine
distance can be another good alternative worth trying.
➢ The algorithm’s training phase consists only of storing the feature vectors and class
labels of the training samples.
➢ In the testing phase, a test point is classified by assigning the labels that are most
frequent among the k training samples nearest to that query point – hence, higher
computation.
Machine Learning AIx11 75
• k-NN algorithm does more computation on test time rather
than train time.
(a) True (b) false
• Which of the following statement is true about k-NN
algorithm?
1.k-NN performs much better if all of the data have the same
scale
2.k-NN works well with a small number of input variables (p), but
struggles when the number of inputs is very large
3.k-NN makes no assumptions about the functional form of the
problem being solved
(a)1and 3 (b)1 and 2 (c) all of above
Machine Learning AIx11 76
• Which of the following will be Euclidean Distance between the
two data point A(1,3) and B(2,3)?
(a)1 (b) 2 (c) 4 (d) 8
Which of the following will be true about k in k-NN in terms of
Bias?
(A)When you increase the k the bias will be increases
(B) When you decrease the k the bias will be increases
(C) Can’t say
(D) None of these
Machine Learning AIx11 77
• A company has build a kNN classifier that gets 100% accuracy
on training data. When they deployed this model on client side
it has been found that the model is not at all accurate. Which of
the following thing might gone wrong?
• Note: Model has successfully deployed and no technical issues
are found at client side except the model performance
• A) It is probably an overfitted model
B) It is probably an underfitted model
C) Can’t say
D) None of these
Machine Learning AIx11 78
Dr. Sara Sweidan
Machine Learning AIx11 Sweidan_ds@fci.bu.edu.eg 79