Artificial Intelligence and Automation
Training Linear Models
Ph.D. Gerardo Marx Chávez-Campos
Instituto Tecnológico de Morelia: Ing. Mecatrónica
Introduction
Summary
◮ The Classification/Prediction task is made by a function that
converts some input in a desired output
◮ Error is the main measure used to determine if our
Classification/Prediction task is good
◮ A problem with the model’s adjustments is that the model is
updated to match the last training example, discarding all
previous training examples.
◮ A good way to fix this is to moderate the updates with a
learning rate (α); thus, no single training example totally
dominates the learning.
Introduction
For now Machine Learning model and their training are black boxes
for now. In this Lecture, we will start by looking at the Linear Re-
gression model, one of the simplest models. Thus, we will discuss
two different ways to train it:
◮ using a direct “closed-form” equation that directly computes
the model parameters that best fit the model to the training
set.
◮ Using an iterative optimization approach called Gradient
Descent (GD) that gradually tweaks the model parameters to
minimize the cost function over the training set.
Next, we will look at Polynomial Regression, a more complex
model that can fit non-linear datasets.
Finally, we will look at two more models that are commonly used for
classification tasks: Logistic Regression and Softmax Regression.
Linear Regression I
In the first laboratory session, we develop a simple regression model
of life satisfaction:
lifeSatis = θ0 + θ1 × GPDperCapita (1)
here θ0 and θ1 are the model parameters.
Linear Regression II
More generally, a linear model makes a prediction by simply comput-
ing a weighted sum of the input features plus a constant called the
bias term (intercept term):
ŷ = θ0 + θ1 x1 + θ2 x2 + θ3 x3 + · · · + θn xn (2)
with ŷ as the predicted value and
◮ n is the number of features
◮ xi is the ith feature
◮ θj as the j th model parameter
Vectorized form
A vectorized form of the Linear Regressor is:
ŷ = hθ (x) = θ · x (3)
◮ θ is the model’s parameter vector
◮ x is the instances’s feature vector, containing x0 to x1 , with
x0 = 1
◮ θ · x is the dot product θ0 x0 + θ1 x1 + θ2 x2 + θ3 x3 + · · · + θn xn
◮ hθ is hypothesis function, using the model parameter θ
How do we train it?
◮ Training a model means setting its parameters that best fits
the training set.
◮ We need a measure to determine how well (or poorly) the
model fits the data
◮ The Root Mean Square Error (RMSE) is the most common
measure
◮ To train the LR Model, you need to find θ that minimizes the
RMSE
The MSE Cost Function
The Mean Square Error (MSE) of a Linear Regression hypothesis
hθ on a training set X is calculated using:
1 (i) 2
m
M SE(X, hθ ) = θx − y (i) (4)
m
i=0
J(θ) = M SE(X, hθ ) (5)
The Normal Equation I
To find the value of θ that minimizes the cost function J(θ), there
is a closed -form solution— in other words, a mathematical
equation that gives the result directly. This is called the Normal
Equation:
∂J(θ)
=0
∂θ
m
∂J(θ) 1
= (θx − y)2 = (θx − y)T (θx − y)
∂θ m
i=1
= (θx)T − y T [θx − y]
The Normal Equation II
∂J(θ)
=0
∂θ
∂J(θ) ∂ ∂
= (θx − y)2 = (θx − y)T (θx − y)
∂θ ∂θ ∂θ
∂
= (θx)T − y T [θx − y]
∂θ
The Normal Equation III
Theorem. The following properties hold:
(AT )T = A
(A + B)T = AT + B T
(kA)T = kAT
(AB)T = AT B T
The Normal Equation IV
just considers that (θx)T y = y T (θx)
∂
0= (θx)T θx − (θx)T y − y T θx + y T y
∂θ
∂ T T
0= θ x θx − 2(θx)T y + y T y
∂θ
∂ 2 T
0= θ x x − 2(θ T xT )y
∂θ
0 =2θxT x − 2(xT )y
2θxT x =2xT y
θxT x =xT y
θ =(xT x)−1 (xT y)
θ̂ =(xT x)−1 (xT y)
Referencias
https://www.geeksforgeeks.org/ml-normal-equation-in-linear-
regression/
https://prutor.ai/normal-equation-in-linear-regression/
https://towardsdatascience.com/performing-linear-regression-
using-the-normal-equation-6372ed3c57
Géron, Aurélien. "Hands-on machine learning with scikit-learn
and tensorflow: Concepts." Tools, and Techniques to build
intelligent systems (2017).