0% found this document useful (0 votes)

28 views16 pages

Regression Analysis in Machine Learning: Context

Regression analysis is used to estimate the relationship between variables and predict future or missing values. It fits a function to data by minimizing error. Common types include linear, polynomial, ridge, and lasso regression. Ridge and lasso address overfitting by regularizing weights. Bayesian regression finds a distribution of weights rather than single values.

Uploaded by

Navneet Lalwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views16 pages

Regression Analysis in Machine Learning: Context

Uploaded by

Navneet Lalwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Regression Analysis in Machine Learning

Context:

In order to understand the motivation behind regression, let's consider

the following simple example. The scatter plot below shows the
number of college graduates in the US from the year 2001 to 2012.

Now based on the available data, what if someone asks you how many
college graduates with master's degrees will there be in the year 2018?
It can be seen that the number of college graduates with master’s
degrees increases almost linearly with the year. So by simple visual
analysis, we can get a rough estimate of that number to be between 2.0
to 2.1 million. Let's look at the actual numbers. The graph below plots
the same variable from the year 2001 to the year 2018. It can be seen
that our predicted number was in the ballpark of the actual value.
Since it was a simpler problem (fitting a line to data), our mind was
easily able to do that. This process of fitting a function to a set of data
points is known as regression analysis.

What is Regression Analysis?

Regression analysis is the process of estimating the relationship

between a dependent variable and independent variables. In simpler
words, it means fitting a function from a selected family of functions to
the sampled data under some error function. Regression analysis is one
of the most basic tools in the area of machine learning used for
prediction. Using regression you fit a function on the available data
and try to predict the outcome for the future or hold-out datapoints.
This fitting of function serves two purposes.

1. You can estimate missing data within your data range

(Interpolation)
2. You can estimate future data outside your data range
(Extrapolation)

Some real-world examples for regression analysis include predicting

the price of a house given house features, predicting the impact of
SAT/GRE scores on college admissions, predicting the sales based on
input parameters, predicting the weather, etc.

Let's consider the previous example of college graduates.

1. Interpolation: Let's assume we have access to somewhat sparse

data where we know the number of college graduates every 4 years,
as shown in the scatter plot below.

We want to estimate the number of college graduates for all the

missing years in between. We can do this by fitting a line to the limited
available data points. This process is called interpolation.
Extrapolation: Let’s assume we have access to limited data from the
year 2001 to the year 2012, and we want to predict the number of
college graduates from the year 2013 to 2018.

It can be seen that the number of college graduates with master’s

degrees increases almost linearly with the year. Hence, it makes sense
to fit a line to the dataset. Using the 12 points to fit a line, and then test
the prediction of this line on the future 6 points, it can be seen that the
prediction is very close.

Mathematically speaking

Types of regression analysis

Now let’s talk about different ways in which we can carry out
regression. Based on the family-of-functions (f_beta), and the loss
function (l) used, we can categorize regression into the following
categories.
1. Linear Regression

In linear regression, the objective is to fit a hyperplane (a line for 2D

data points) by minimizing the sum of mean-squared error for each
data point.

Mathematically speaking, linear regression solves the following

problem

Hence we need to find 2 variables denoted by beta that parameterize

the linear function f(.). An example of linear regression can be seen in
the figure 4 above where P=5. The figure also shows the fitted linear
function with beta_0 = -90.798 and beta_1 = 0.046

2. Polynomial Regression

Linear regression assumes that the relationship between the

dependant (y) and independent (x) variables are linear. It fails to fit the
data points when the relationship between them is not linear.
Polynomial regression expands the fitting capabilities of linear
regression by fitting a polynomial of degree m to the data points
instead. The richer the function under consideration, the better (in
general) its fitting capabilities. Mathematically speaking, polynomial
regression solves the following problem.

Hence we need to find (m+1) variables denoted by beta_0, …,beta_m.

It can be seen that linear regression is a special case of polynomial
regression with degree 2.

Consider the following set of data points plotted as a scatter plot. If we

use linear regression, we get a fit that clearly fails to estimate the data
points. But if we use polynomial regression with degree 6, we get a
much better fit as shown below

[Left] Scatter plot of data — [Center] Linear regression on data — [Right] Polynomial regression of

degree 6
Since the data points did not have a linear relationship between
dependant and independent variables, linear regression failed to
estimate a good fitting function. On the other hand, polynomial
regression was able to capture the non-linear relationship.

3. Ridge Regression

Ridge regression addresses the issue of overfitting in regression

analysis. To understand that, consider the same example as above.
When a polynomial of degree 25 is fit on the data with 10 training
points, it can be seen that it fits the red data points perfectly (center
figure below). But in doing so, it compromises other points in between
(spike between last two data points). This can be seen in the figure
below. Ridge regression tries to address this issue. It tries to minimize
the generalization error by compromising the fit on the training points.

Left] Scatter plot of data — [Center] Polynomial regression of degree 25— [Right] Polynomial Ridge

regression of degree 25

Mathematically speaking, ridge regression solves the following

problem by modifying the loss function.
The function f(x) can either be linear or polynomial. In the absence of
ridge regression, when the function overfits the data points, the
weights learned to tend to be pretty high. Ridge regression avoids over-
fitting by limiting the norm of the weights being learned by introducing
the scaled L2 norm of the weights (beta) in the loss function. Hence the
trained model trade-offs between fitting the data point perfectly (large
norm of the learned weights) and limiting the norm of the weights. The
scaling constant alpha>0 is used to control this trade-off. A small value
of alpha will result in higher norm weights and overfitting the training
data points. On the other hand, a large alpha value will result in a
function with a poor fit to the training data points but a very small
norm of the weights. Choosing the value of alpha carefully will yield the
best trade-off.

4. LASSO regression

LASSO regression is similar to Ridge regression as both of them are

used as regularizers against overfitting on the training data points. But
LASSO comes with an additional benefit. It enforces sparsity on the
learned weights.
Ridge regression enforces the norm of the learned weights to be small
yielding a set of weights where the total norm is reduced. Most of the
weights (if not all) will be non-zero. LASSO on the other hand tries to
find a set of weights by making most of them really close to zero. This
yields a sparse weight matrix whose implementation can be much
more energy-efficient than a non-sparse weight matrix while
maintaining similar accuracy in terms of fitting to the data points.

The figure below tries to visualize this idea on the same example as
above. The data points are fit using both the Ridge and Lasso
regression and their corresponding fit and weighs are plotted in
ascending order. It can be seen that most of the weights in the LASSO
regression are really close to zero.
Mathematically speaking, LASSO regression solves the following
problem by modifying the loss function.

The difference between LASSO and Ridge regression is that LASSO

uses the L1 norm of the weights instead of the L2 norm. This L1 norm
in the loss function tends to increase sparsity in the learned weights.

The constant alpha>0 is used to control the tradeoff between the fit
and the sparsity in the learned weights. A large value of alpha results in
poor fit but a sparser learned set of weights. On the other hand, a small
value of alpha results in a tight fit on training data points (might lead
to over-fitting) but with a less sparse set of weights.

5. ElasticNet Regression

ElasticNet regression is a combination of Ridge and LASSO regression.

The loss term includes both the L1 and L2 norm of the weights with
their respective scaling constants. It is often used to address the
limitations of LASSO regression such as the non-convex nature.
ElasticNet adds a quadratic penalty of the weights making it
predominantly convex.
Mathematically speaking, ElasticNet regression solves the following
problem by modifying the loss function.

6. Bayesian Regression

For the regression discussed above (the frequentists approach), the

goal is to find a set of deterministic values of weights (beta) that
explain the data. In Bayesian regression, instead of finding one value
for each weight, we rather try to find the distribution for these weights
assuming a prior.

So we start off with an initial distribution of the weights and based on

the

So we start off with an initial distribution of the weights and based on

the data nudge the distribution in the right direction by making use of
the Bayesian theorem that relates the prior distribution to posterior
distribution based on the likelihood and the evidence.
When we have infinite data points, the posterior distribution of the
weights becomes an impulse at the solution of ordinary least square
solution i.e. the variance approaches zero.

Finding the distribution of weights instead of a single set of

deterministic values serves two purposes

1. t naturally guards against the issue of overfitting hence acting as a

regularizer

2. It provides confidence and a range for weights which makes more

logical sense than just returning one value.

Let us mathematically formulate the problem and state its solution.

Let us a Gaussian prior on the weights with mean μ and

covariance Σ i.e

Based on the available data D, we update this distribution. For the

problem at hand, the posterior will be a gaussian distribution with the
following parameters
7. Logistic Regression

Logistic regression comes in handy in the classification tasks where the

output needs to be the conditional probability of the output given the
input. Mathematically speaking, logistic regression solves the following
problem

Consider the following example where the data points belong to one of
the two categories: {0 (red), 1 (yellow)} as shown in the scatter plot
below.
[left] Scatter plot of data points — [Right] Logistic regression trained on data
points plotted in blue

Logistic regression uses a sigmoid function at the output of the linear

or polynomial function to map the output from (-♾, ♾)to (0, 1). A
threshold (usually 0.5) is then used to categorize the test data into one
of the two categories.

This may seem like Logistic regression is not regression but a

classification algorithm. But that is not the case. You can find
more about it here in Adrian’s post.

https://towardsdatascience.com/a-beginners-guide-to-regression-analysis-in-machine-learning-
8a828b491bbf

Model Exam - Economics - AAU
93% (14)
Model Exam - Economics - AAU
20 pages
Student Solutions Manual for Mathematics for Economics, fourth edition
From Everand
Student Solutions Manual for Mathematics for Economics, fourth edition
Michael Hoy
No ratings yet
Types of Supervised Learning2
No ratings yet
Types of Supervised Learning2
66 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
Fmatter
No ratings yet
Fmatter
19 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
DAV 2201079 Exp 2 2-1
No ratings yet
DAV 2201079 Exp 2 2-1
35 pages
Microemulsion Binder 2 PDF
No ratings yet
Microemulsion Binder 2 PDF
320 pages
Gashaw Gudeta Thesis
No ratings yet
Gashaw Gudeta Thesis
79 pages
Journal December 21
No ratings yet
Journal December 21
181 pages
Unit 2 Notes - Final
No ratings yet
Unit 2 Notes - Final
32 pages
Regression
No ratings yet
Regression
45 pages
Mock Exam 6 Ans
No ratings yet
Mock Exam 6 Ans
69 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Unit 3 Da
No ratings yet
Unit 3 Da
20 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
Kumsa Thesis - After Defence Final
No ratings yet
Kumsa Thesis - After Defence Final
93 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Module 5.2
No ratings yet
Module 5.2
51 pages
Unit - II - DA
No ratings yet
Unit - II - DA
22 pages
Romanian Statistical Review Supplement First Quarter 2013
No ratings yet
Romanian Statistical Review Supplement First Quarter 2013
301 pages
4 ML
No ratings yet
4 ML
41 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
9 Types of Regression Analysis
No ratings yet
9 Types of Regression Analysis
16 pages
Unit - Iii Data Analysis
No ratings yet
Unit - Iii Data Analysis
39 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Assignment Group C
No ratings yet
Assignment Group C
8 pages
The Value of A Non-Sport-Specific Motor Test Battery in Predicting Performance in Young Female Gymnasts
No ratings yet
The Value of A Non-Sport-Specific Motor Test Battery in Predicting Performance in Young Female Gymnasts
11 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Chapter4 (The Evaluating Multiple Models Chapter Is Really Good!)
No ratings yet
Chapter4 (The Evaluating Multiple Models Chapter Is Really Good!)
47 pages
Unit 2
No ratings yet
Unit 2
67 pages
Bcom - Ca - Syllabus
No ratings yet
Bcom - Ca - Syllabus
96 pages
ML Points
No ratings yet
ML Points
13 pages
DA Unit-3
No ratings yet
DA Unit-3
13 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Edab Module - 3
No ratings yet
Edab Module - 3
17 pages
2021-GIS-based Landslide Susceptibility Assessment Using Optimized Hybrid Machine Learning Methods
No ratings yet
2021-GIS-based Landslide Susceptibility Assessment Using Optimized Hybrid Machine Learning Methods
16 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Rohini 73149042113
No ratings yet
Rohini 73149042113
11 pages
Data Analytics Unit 2
No ratings yet
Data Analytics Unit 2
13 pages
Unit - Iii
No ratings yet
Unit - Iii
9 pages
6 Regression Analysis
No ratings yet
6 Regression Analysis
12 pages
DA Unit-3
No ratings yet
DA Unit-3
14 pages
Analisis Kesadaran Merek, Persepsi Kualitas, Asosiasi Merek, Dan Loyalitas Merek Yang Mempengaruhi Ekuitas Merek Produk Handphone Nokia
No ratings yet
Analisis Kesadaran Merek, Persepsi Kualitas, Asosiasi Merek, Dan Loyalitas Merek Yang Mempengaruhi Ekuitas Merek Produk Handphone Nokia
28 pages
Unit 2
No ratings yet
Unit 2
92 pages
Unit 2
No ratings yet
Unit 2
8 pages
5 - AML Lecture 5 - Linear Regression
No ratings yet
5 - AML Lecture 5 - Linear Regression
56 pages
Unit-Iii-1 1
No ratings yet
Unit-Iii-1 1
31 pages
Types of Regression
No ratings yet
Types of Regression
8 pages
Linear Regression Case Study
No ratings yet
Linear Regression Case Study
6 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Aml 3
No ratings yet
Aml 3
19 pages
Estimation of Exchangeable Sodium Percentage From Sodium Adsorption
No ratings yet
Estimation of Exchangeable Sodium Percentage From Sodium Adsorption
10 pages
2 Regression Models
No ratings yet
2 Regression Models
6 pages
G Poter Art 1
No ratings yet
G Poter Art 1
17 pages
Activation Functions in Neural Networks: What Is Activation Function?
No ratings yet
Activation Functions in Neural Networks: What Is Activation Function?
11 pages
Lesson 4: Finding Answers Through Data Collection: How To Join The VSMART?
No ratings yet
Lesson 4: Finding Answers Through Data Collection: How To Join The VSMART?
15 pages
Anticipated Work-Family Conflict: Effects of Gender, Self-Efficacy, and Family Background
No ratings yet
Anticipated Work-Family Conflict: Effects of Gender, Self-Efficacy, and Family Background
15 pages
Limitations of ML
No ratings yet
Limitations of ML
10 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Lab4 MultipleLinearRegression
No ratings yet
Lab4 MultipleLinearRegression
7 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
16 pages
Da Module 3
No ratings yet
Da Module 3
54 pages
Unit III
No ratings yet
Unit III
18 pages
Practice
No ratings yet
Practice
2 pages
Social Capital and Fear of Crime: A Test of Organizational Participation Effect in Nigeria
No ratings yet
Social Capital and Fear of Crime: A Test of Organizational Participation Effect in Nigeria
11 pages
Machine Learning Question Bank-Unit 3
No ratings yet
Machine Learning Question Bank-Unit 3
6 pages
3rd Semester Odisha University Syllabus - Bcom Hons
No ratings yet
3rd Semester Odisha University Syllabus - Bcom Hons
8 pages
Desmos User Guide
No ratings yet
Desmos User Guide
13 pages
Chapter2 - Optimisation
No ratings yet
Chapter2 - Optimisation
7 pages
Regression
No ratings yet
Regression
16 pages
Regression Analysis in Machine Learning: Temperature, Age, Salary, Price
No ratings yet
Regression Analysis in Machine Learning: Temperature, Age, Salary, Price
12 pages
Module 3
No ratings yet
Module 3
34 pages
WV47RS 1729818436814
No ratings yet
WV47RS 1729818436814
1 page
Why Convexity Is The Key To Optimization: Convex Sets
No ratings yet
Why Convexity Is The Key To Optimization: Convex Sets
4 pages
What Is Distribution?
No ratings yet
What Is Distribution?
4 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Regression: UNIT - V Regression Model
100% (1)
Regression: UNIT - V Regression Model
21 pages
Regression
No ratings yet
Regression
4 pages
DA Notes 3
No ratings yet
DA Notes 3
12 pages
The Use of Conventional Decline Curve Analysis For Predicting Future Performance of Tight-Gas Wells
No ratings yet
The Use of Conventional Decline Curve Analysis For Predicting Future Performance of Tight-Gas Wells
4 pages
Frank 144 9330
No ratings yet
Frank 144 9330
2 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
3 pages
A Strategy For Designing and Executing An Effective Regression Test Program
No ratings yet
A Strategy For Designing and Executing An Effective Regression Test Program
6 pages
Ardl Model
No ratings yet
Ardl Model
20 pages
2009 Neuronorma 4 VOSP JLO PDF
No ratings yet
2009 Neuronorma 4 VOSP JLO PDF
16 pages
MTH 302 Assignment # 1 JS100400153
No ratings yet
MTH 302 Assignment # 1 JS100400153
3 pages
Regression Analysis: Post Mid Assignment Topic
No ratings yet
Regression Analysis: Post Mid Assignment Topic
8 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Artificial Neural Networks For Machine Learning - Every Aspect You Need To Know About
No ratings yet
Artificial Neural Networks For Machine Learning - Every Aspect You Need To Know About
9 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Sales Forecasting: Types of Sales Forecast: The Term Forecast' Is Generally Used To Describe A Prediction
No ratings yet
Sales Forecasting: Types of Sales Forecast: The Term Forecast' Is Generally Used To Describe A Prediction
18 pages
Snowpro™ Advanced: Data Scientist: Exam Study Guide
No ratings yet
Snowpro™ Advanced: Data Scientist: Exam Study Guide
14 pages

Regression Analysis in Machine Learning: Context

Uploaded by

Regression Analysis in Machine Learning: Context

Uploaded by

Regression Analysis in Machine Learning

In order to understand the motivation behind regression, let's consider

What is Regression Analysis?

Regression analysis is the process of estimating the relationship

1. You can estimate missing data within your data range

Some real-world examples for regression analysis include predicting

Let's consider the previous example of college graduates.

1. Interpolation: Let's assume we have access to somewhat sparse

We want to estimate the number of college graduates for all the

It can be seen that the number of college graduates with master’s

Types of regression analysis

In linear regression, the objective is to fit a hyperplane (a line for 2D

Mathematically speaking, linear regression solves the following

Hence we need to find 2 variables denoted by beta that parameterize

Linear regression assumes that the relationship between the

Hence we need to find (m+1) variables denoted by beta_0, …,beta_m.

Consider the following set of data points plotted as a scatter plot. If we

[Left] Scatter plot of data — [Center] Linear regression on data — [Right] Polynomial regression of

Ridge regression addresses the issue of overfitting in regression

Left] Scatter plot of data — [Center] Polynomial regression of degree 25— [Right] Polynomial Ridge

Mathematically speaking, ridge regression solves the following

LASSO regression is similar to Ridge regression as both of them are

The difference between LASSO and Ridge regression is that LASSO

ElasticNet regression is a combination of Ridge and LASSO regression.

For the regression discussed above (the frequentists approach), the

So we start off with an initial distribution of the weights and based on

So we start off with an initial distribution of the weights and based on

Finding the distribution of weights instead of a single set of

1. t naturally guards against the issue of overfitting hence acting as a

2. It provides confidence and a range for weights which makes more

Let us mathematically formulate the problem and state its solution.

Let us a Gaussian prior on the weights with mean μ and

Based on the available data D, we update this distribution. For the

Logistic regression comes in handy in the classification tasks where the

Logistic regression uses a sigmoid function at the output of the linear

This may seem like Logistic regression is not regression but a

You might also like