Foundations of Machine Learning
Dr. Panashe Chiurunge
Machine Learning
TensorFlow
Directed Acyclic Graphs
TensorFlow Eager Execution Mode
TensorFlow KERAS API
Linear Regression with TensorFlow
What is Machine Learning
The types of Machine Learning
What is Machine Learning
We are trying to learn from data or learn the
representation of the data
To formulate the basic learning from data problem,
we must specify several basic elements:
data spaces, probability measures, loss
functions, and statistical risk
Machine Learning – Data Space
We have to learn from some data
Learning from data begins with a specification of
two spaces
The Input space is also sometimes called the
feature space
The Output space is also called the "label space",
"outcome space", "signal range", or in statistical
regression the "response space"
Machine Learning
We then want to create a function that can map the
representation of the feature space given some random
noise within the data
Machine Learning
The basic problem in machine learning is to determine
a mapping
That takes an input
Predicts the output
Machine Learning – Loss Functions
Since we are trying to predict/classify labels we need to
measure the performance of our learner in some way.
Suppose we have a true label and a label
prediction
A loss function measures how "different" are these two
quantities. Formally, a loss function is a map
Machine Learning – Loss Functions
Suppose we have a true label and a label
prediction
A loss function measures how "different" are these two
quantities. Formally, a loss function is a map
Cost function
Machine Learning – Loss Functions
Cost function
In regression or estimation problems , . The squared
error loss function is often employed.
Machine Learning – Loss Functions
Cost function
The loss function can be used to measure the " risk"
of a learning rule.
We have to minimize this " risk" of a learning rule as
we learn our data representation
Machine Learning – Loss Functions
Cost function
The loss function can be used to measure the " risk"
of a learning rule.
We have to minimize this " risk" of a learning rule as
we learn our data representation
Machine Learning – Linear Regression
Linear Regression is simply finding the best possible
line of fit that represent a set of data point.
In Machine Learning terms we are creating a
learning rule that fits a line of representation of our
data
Machine Learning – Linear Regression
Linear Regression is simply finding the best possible
line of fit that represent a set of data point.
In Machine Learning terms we are creating a
learning rule that fits a line of representation of our
data
Machine Learning – Linear Regression
Let’s suppose we want to model the above set of
points with a line.
To do this we’ll use the standard line
equation where is the line’s gradient and is
the line’s intercept.
Machine Learning – Linear Regression
To find the best line for our data, we need to find the
best set of gradient and intercept values.
Machine Learning – Linear Regression
A standard approach to solving this type of problem is
to define an error function (also called a cost
function/loss function) that measures how “good” a
given line is.
This function will take in a (m,b) pair and return an error
value based on how well the line fits our data.
Machine Learning – Linear Regression
To compute this error for a given line, we’ll iterate
through each (x,y) point in our data set and sum the
square distances between each point’s y value and the
candidate line’s y value (computed at mx + b).
It’s conventional to square this distance to ensure that
it is positive and to make our cost function
Machine Learning – Linear Regression
Our loss function is
Which is
Machine Learning – Linear Regression
Our loss function is
Lines that fit our data better (where better is defined by
our cost function) will result in lower error values.
If we minimize this function, we will get the best line of
fit to represent our data.
Machine Learning – Linear Regression
Our cost function consists of two parameters (m and b)
we can visualize it as a two-dimensional surface
Machine Learning – Gradient Descent
Each point in this two-dimensional space represents a line. The
height of the function at each point is the error value for that line.
You can see that some lines yield smaller error values than
others (i.e., fit our data better). When we run gradient descent
search, we will start from some location on this surface and
move downhill to find the line with the lowest error.
Machine Learning – Gradient Descent
To run gradient descent on this error function, we first need to
compute its gradient.
The gradient will act like a compass and always point us downhill.
To compute it, we will need to differentiate our error function.
Since our function is defined by two parameters (m and b), we
will need to compute a partial derivative for each.
These derivatives work out to be:
Machine Learning – Gradient Descent
We can initialize our search to start at any pair of m and b
values (i.e., any line) and let the gradient descent algorithm
march downhill on our error function towards the best line.
Each iteration will update m and b to a line that yields slightly
lower error than the previous iteration.
The direction to move in for each iteration is calculated using the
two partial derivatives
Machine Learning – Gradient Descent
The learning Rate variable controls how large of a step we take
downhill during each iteration. If we take too large of a step, we
may step over the minimum.
However, if we take small steps, it will require many iterations to
arrive at the minimum
Machine Learning – Gradient Descent
We can also observe how the error changes as we move toward
the minimum. A good way to ensure that gradient descent is
working correctly is to make sure that the error decreases for
each iteration.
Machine Learning – Gradient Descent
We can also observe how the error changes as we move toward
the minimum. A good way to ensure that gradient descent is
working correctly is to make sure that the error decreases for
each iteration.
Machine Learning – Gradient Descent
Machine Learning – Gradient Descent
Machine Learning – Gradient Descent
Machine Learning – Gradient Descent
Machine Learning – Gradient Descent
Machine Learning – Gradient Descent
Do the following until convergence
Machine Learning –
Stochastic Gradient Descent
Machine Learning –
Stochastic Gradient Descent
Machine Learning – GD Algorithms
Stochastic Gradient Descent
Adaptive Momentum Estimation
Nesterov accelerated Gradient
Adaptive Gradient Descent
Adaptive Learning Rate Method
Root Mean Square Propagation
Machine Learning
Q&A